This makes sense. Thanks for clarifying, Mridul.

As Sean pointed out - a contrib module quickly turns into a legacy code
base that becomes hard to maintain. From that perspective, I think the idea
of a separate sparkbank github that is maintained by Spark contributors
(along with users who wish to contribute add-ons like you've described) and
adhere to the code quality and reviews like the main project seems
appealing. And then not just sparkbank but other things that people might
want to have as a part of the project but doesn't belong to the core
codebase can go there? I don't know if things like this have come up in the
past pull requests.

-Amandeep

PS: I'm not a spark committer/contributor so take my opinion fwiw. :)


On Sun, Feb 23, 2014 at 1:40 AM, Mridul Muralidharan <mri...@gmail.com>wrote:

> Good point, and I was purposefully vague on that since that is something
> which our community should evolve imo : this was just an initial proposal
> :-)
>
> For example: there are multiple ways to do cartesian - and each has its own
> trade offs.
>
> Another candidate could be, as I mentioned, new methods which can be
> expressed as sequences of existing methods but would be slightly more
> performent if done in one shot - like the self cartesian pr, various types
> of join (which can become a contrib of its own btw !), experiments using
> key indexes, ordering, etc.
>
> Addition into sparkbank or contrib (or something bettrr named !) does not
> preclude future migration into core ... just an initial staging area for us
> to e olve the api and get user feedback; without necessarily making spark
> core api unstable.
>
> Obviously, it is not a dumping ground for broken code/ideas ... and must
> follow same level of scrutiny and rigour before committing.
> Regards
> Mridul
>  On Feb 23, 2014 11:53 AM, "Amandeep Khurana" <ama...@gmail.com> wrote:
>
> > Mridul,
> >
> > Can you give examples of APIs that people have contributed (or wanted
> > to contribute) but you categorize as something that would go into
> > piggybank-like (sparkbank)? Curious to know how you'd decide what
> > should go where.
> >
> > Amandeep
> >
> > > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > >  Over the past few months, I have seen a bunch of pull requests which
> > have
> > > extended spark api ... most commonly RDD itself.
> > >
> > > Most of them are either relatively niche case of specialization (which
> > > might not be useful for most cases) or idioms which can be expressed
> > > (sometimes with minor perf penalty) using existing api.
> > >
> > > While all of them have non zero value (hence the effort to contribute,
> > and
> > > gladly welcomed !) they are extending the api in nontrivial ways and
> > have a
> > > maintenance cost ... and we already have a pending effort to clean up
> our
> > > interfaces prior to 1.0
> > >
> > > I believe there is a need to keep exposed api succint, expressive and
> > > functional in spark; while at the same time, encouraging extensions and
> > > specialization within spark codebase so that other users can benefit
> from
> > > the shared contributions.
> > >
> > > One approach could be to start something akin to piggybank in pig to
> > > contribute user generated specializations, helper utils, etc : bundled
> as
> > > part of spark, but not part of core itself.
> > >
> > > Thoughts, comments ?
> > >
> > > Regards,
> > > Mridul
> >
>

Reply via email to