Re: [DISCUSS] Extending public API

Mridul Muralidharan Sun, 23 Feb 2014 01:42:13 -0800

Good point, and I was purposefully vague on that since that is something
which our community should evolve imo : this was just an initial proposal
:-)

For example: there are multiple ways to do cartesian - and each has its own
trade offs.

Another candidate could be, as I mentioned, new methods which can be
expressed as sequences of existing methods but would be slightly more
performent if done in one shot - like the self cartesian pr, various types
of join (which can become a contrib of its own btw !), experiments using
key indexes, ordering, etc.

Addition into sparkbank or contrib (or something bettrr named !) does not
preclude future migration into core ... just an initial staging area for us
to e olve the api and get user feedback; without necessarily making spark
core api unstable.

Obviously, it is not a dumping ground for broken code/ideas ... and must
follow same level of scrutiny and rigour before committing.
Regards
Mridul
 On Feb 23, 2014 11:53 AM, "Amandeep Khurana" <ama...@gmail.com> wrote:

> Mridul,
>
> Can you give examples of APIs that people have contributed (or wanted
> to contribute) but you categorize as something that would go into
> piggybank-like (sparkbank)? Curious to know how you'd decide what
> should go where.
>
> Amandeep
>
> > On Feb 22, 2014, at 10:06 PM, Mridul Muralidharan <mri...@gmail.com>
> wrote:
> >
> > Hi,
> >
> >  Over the past few months, I have seen a bunch of pull requests which
> have
> > extended spark api ... most commonly RDD itself.
> >
> > Most of them are either relatively niche case of specialization (which
> > might not be useful for most cases) or idioms which can be expressed
> > (sometimes with minor perf penalty) using existing api.
> >
> > While all of them have non zero value (hence the effort to contribute,
> and
> > gladly welcomed !) they are extending the api in nontrivial ways and
> have a
> > maintenance cost ... and we already have a pending effort to clean up our
> > interfaces prior to 1.0
> >
> > I believe there is a need to keep exposed api succint, expressive and
> > functional in spark; while at the same time, encouraging extensions and
> > specialization within spark codebase so that other users can benefit from
> > the shared contributions.
> >
> > One approach could be to start something akin to piggybank in pig to
> > contribute user generated specializations, helper utils, etc : bundled as
> > part of spark, but not part of core itself.
> >
> > Thoughts, comments ?
> >
> > Regards,
> > Mridul
>

Re: [DISCUSS] Extending public API

Reply via email to