I think SPARK-1063 (PR-503) “Add .sortBy(f) method on RDD” would be a good example. Note that I’m not saying that this PR is already qualified to be accepted, just take it as an example: JIRA issue: https://spark-project.atlassian.net/browse/SPARK-1063 GitHub PR: https://github.com/apache/incubator-spark/pull/508
On Feb 23, 2014, at 2:23 PM, Amandeep Khurana <ama...@gmail.com> wrote: > Mridul, > > Can you give examples of APIs that people have contributed (or wanted > to contribute) but you categorize as something that would go into > piggybank-like (sparkbank)? Curious to know how you'd decide what > should go where. > > Amandeep > >> On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com> wrote: >> >> Hi, >> >> Over the past few months, I have seen a bunch of pull requests which have >> extended spark api ... most commonly RDD itself. >> >> Most of them are either relatively niche case of specialization (which >> might not be useful for most cases) or idioms which can be expressed >> (sometimes with minor perf penalty) using existing api. >> >> While all of them have non zero value (hence the effort to contribute, and >> gladly welcomed !) they are extending the api in nontrivial ways and have a >> maintenance cost ... and we already have a pending effort to clean up our >> interfaces prior to 1.0 >> >> I believe there is a need to keep exposed api succint, expressive and >> functional in spark; while at the same time, encouraging extensions and >> specialization within spark codebase so that other users can benefit from >> the shared contributions. >> >> One approach could be to start something akin to piggybank in pig to >> contribute user generated specializations, helper utils, etc : bundled as >> part of spark, but not part of core itself. >> >> Thoughts, comments ? >> >> Regards, >> Mridul