I don't think Spark supports this model, where N inputs depending on parent are computed once at the same time. You can of course cache the parent and filter N times and do the same amount of work. One problem is, where would the N inputs live? they'd have to be stored if not used immediately, and presumably in any use case, only one of them would be used immediately. If you have a job that needs to split records of a parent into N subsets, and then all N subsets are used, you can do that -- you are just transforming the parent to one child that has rows with those N splits of each input row, and then consume that. See randomSplit() for maybe the best case, where it still produce N Datasets but can do so efficiently because it's just a random sample.
On Sun, Feb 3, 2019 at 12:20 AM Moein Hosseini <moein...@gmail.com> wrote: > I don't consider it as method to apply filtering multiple time, instead > use it as semi-action not just transformation. Let's think that we have > something like map-partition which accept multiple lambda that each one > collect their ROW for their dataset (or something like it). Is it possible? > > On Sat, Feb 2, 2019 at 5:59 PM Sean Owen <sro...@gmail.com> wrote: > >> I think the problem is that can't produce multiple Datasets from one >> source in one operation - consider that reproducing one of them would mean >> reproducing all of them. You can write a method that would do the filtering >> multiple times but it wouldn't be faster. What do you have in mind that's >> different? >> >> On Sat, Feb 2, 2019 at 12:19 AM Moein Hosseini <moein...@gmail.com> >> wrote: >> >>> I've seen many application need to split dataset to multiple datasets >>> based on some conditions. As there is no method to do it in one place, >>> developers use *filter *method multiple times. I think it can be useful >>> to have method to split dataset based on condition in one iteration, >>> something like *partition* method of scala (of-course scala partition >>> just split list into two list, but something more general can be more >>> useful). >>> If you think it can be helpful, I can create Jira issue and work on it >>> to send PR. >>> >>> Best Regards >>> Moein >>> >>> -- >>> >>> Moein Hosseini >>> Data Engineer >>> mobile: +98 912 468 1859 <+98+912+468+1859> >>> site: www.moein.xyz >>> email: moein...@gmail.com >>> [image: linkedin] <https://www.linkedin.com/in/moeinhm> >>> [image: twitter] <https://twitter.com/moein7tl> >>> >>> > > -- > > Moein Hosseini > Data Engineer > mobile: +98 912 468 1859 <+98+912+468+1859> > site: www.moein.xyz > email: moein...@gmail.com > [image: linkedin] <https://www.linkedin.com/in/moeinhm> > [image: twitter] <https://twitter.com/moein7tl> > >