I think SPARK-1063 (PR-503) “Add .sortBy(f) method on RDD” would be a good 
example. Note that I’m not saying that this PR is already qualified to be 
accepted, just take it as an example:
JIRA issue: https://spark-project.atlassian.net/browse/SPARK-1063
GitHub PR: https://github.com/apache/incubator-spark/pull/508

On Feb 23, 2014, at 2:23 PM, Amandeep Khurana <ama...@gmail.com> wrote:

> Mridul,
> 
> Can you give examples of APIs that people have contributed (or wanted
> to contribute) but you categorize as something that would go into
> piggybank-like (sparkbank)? Curious to know how you'd decide what
> should go where.
> 
> Amandeep
> 
>> On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> Over the past few months, I have seen a bunch of pull requests which have
>> extended spark api ... most commonly RDD itself.
>> 
>> Most of them are either relatively niche case of specialization (which
>> might not be useful for most cases) or idioms which can be expressed
>> (sometimes with minor perf penalty) using existing api.
>> 
>> While all of them have non zero value (hence the effort to contribute, and
>> gladly welcomed !) they are extending the api in nontrivial ways and have a
>> maintenance cost ... and we already have a pending effort to clean up our
>> interfaces prior to 1.0
>> 
>> I believe there is a need to keep exposed api succint, expressive and
>> functional in spark; while at the same time, encouraging extensions and
>> specialization within spark codebase so that other users can benefit from
>> the shared contributions.
>> 
>> One approach could be to start something akin to piggybank in pig to
>> contribute user generated specializations, helper utils, etc : bundled as
>> part of spark, but not part of core itself.
>> 
>> Thoughts, comments ?
>> 
>> Regards,
>> Mridul

Reply via email to