Mridul, Can you give examples of APIs that people have contributed (or wanted to contribute) but you categorize as something that would go into piggybank-like (sparkbank)? Curious to know how you'd decide what should go where.
Amandeep > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com> wrote: > > Hi, > > Over the past few months, I have seen a bunch of pull requests which have > extended spark api ... most commonly RDD itself. > > Most of them are either relatively niche case of specialization (which > might not be useful for most cases) or idioms which can be expressed > (sometimes with minor perf penalty) using existing api. > > While all of them have non zero value (hence the effort to contribute, and > gladly welcomed !) they are extending the api in nontrivial ways and have a > maintenance cost ... and we already have a pending effort to clean up our > interfaces prior to 1.0 > > I believe there is a need to keep exposed api succint, expressive and > functional in spark; while at the same time, encouraging extensions and > specialization within spark codebase so that other users can benefit from > the shared contributions. > > One approach could be to start something akin to piggybank in pig to > contribute user generated specializations, helper utils, etc : bundled as > part of spark, but not part of core itself. > > Thoughts, comments ? > > Regards, > Mridul