I'm also curious what the vetting process will be for this spark-contrib code? Does inclusion in spark-contrib mean that it has received some sort of review and official blessing, or is contrib just a dumping ground for code of questionable quality, utility, maintenance, etc.?
On Sat, Feb 22, 2014 at 10:23 PM, Amandeep Khurana <ama...@gmail.com> wrote: > Mridul, > > Can you give examples of APIs that people have contributed (or wanted > to contribute) but you categorize as something that would go into > piggybank-like (sparkbank)? Curious to know how you'd decide what > should go where. > > Amandeep > > > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com> > wrote: > > > > Hi, > > > > Over the past few months, I have seen a bunch of pull requests which > have > > extended spark api ... most commonly RDD itself. > > > > Most of them are either relatively niche case of specialization (which > > might not be useful for most cases) or idioms which can be expressed > > (sometimes with minor perf penalty) using existing api. > > > > While all of them have non zero value (hence the effort to contribute, > and > > gladly welcomed !) they are extending the api in nontrivial ways and > have a > > maintenance cost ... and we already have a pending effort to clean up our > > interfaces prior to 1.0 > > > > I believe there is a need to keep exposed api succint, expressive and > > functional in spark; while at the same time, encouraging extensions and > > specialization within spark codebase so that other users can benefit from > > the shared contributions. > > > > One approach could be to start something akin to piggybank in pig to > > contribute user generated specializations, helper utils, etc : bundled as > > part of spark, but not part of core itself. > > > > Thoughts, comments ? > > > > Regards, > > Mridul >