Good point, and I was purposefully vague on that since that is something which our community should evolve imo : this was just an initial proposal :-)
For example: there are multiple ways to do cartesian - and each has its own trade offs. Another candidate could be, as I mentioned, new methods which can be expressed as sequences of existing methods but would be slightly more performent if done in one shot - like the self cartesian pr, various types of join (which can become a contrib of its own btw !), experiments using key indexes, ordering, etc. Addition into sparkbank or contrib (or something bettrr named !) does not preclude future migration into core ... just an initial staging area for us to e olve the api and get user feedback; without necessarily making spark core api unstable. Obviously, it is not a dumping ground for broken code/ideas ... and must follow same level of scrutiny and rigour before committing. Regards Mridul On Feb 23, 2014 11:53 AM, "Amandeep Khurana" <ama...@gmail.com> wrote: > Mridul, > > Can you give examples of APIs that people have contributed (or wanted > to contribute) but you categorize as something that would go into > piggybank-like (sparkbank)? Curious to know how you'd decide what > should go where. > > Amandeep > > > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com> > wrote: > > > > Hi, > > > > Over the past few months, I have seen a bunch of pull requests which > have > > extended spark api ... most commonly RDD itself. > > > > Most of them are either relatively niche case of specialization (which > > might not be useful for most cases) or idioms which can be expressed > > (sometimes with minor perf penalty) using existing api. > > > > While all of them have non zero value (hence the effort to contribute, > and > > gladly welcomed !) they are extending the api in nontrivial ways and > have a > > maintenance cost ... and we already have a pending effort to clean up our > > interfaces prior to 1.0 > > > > I believe there is a need to keep exposed api succint, expressive and > > functional in spark; while at the same time, encouraging extensions and > > specialization within spark codebase so that other users can benefit from > > the shared contributions. > > > > One approach could be to start something akin to piggybank in pig to > > contribute user generated specializations, helper utils, etc : bundled as > > part of spark, but not part of core itself. > > > > Thoughts, comments ? > > > > Regards, > > Mridul >