I'm pretty excited about this.  This removes all the pain of contributing
UDFs.

Russ

On Tue, Aug 31, 2010 at 7:39 AM, Corbin Hoenes <cor...@tynt.com> wrote:

> All sounds reasonable thanks for explaining the thought process.
>
> On Aug 29, 2010, at 3:11 PM, Dmitriy Ryaboy wrote:
>
> > Hi folks,
> >
> > I'll try to address both Corbin's and Milind's questions. This is just my
> > opinion, I'm open to criticism/suggestions/corrections.
> >
> > There are several barriers that are being removed.
> >
> > First, piggybank will no longer be bound to the pig release schedule. At
> the
> > moment, I am not sure there will be "releases" of piggybank, as such --
> we
> > might just tag snapshots with their own git branches and move on. This
> > allows the code to develop at a much faster pace, while possibly
> sacrificing
> > some of the stability and permanence of Apache-style releases. I feel
> that
> > this is ok, as piggybank was always subject to less stringent testing,
> and
> > the attitude towards it has long been "it might work, and you might have
> to
> > tweak it if it doesn't".
> >
> > Second, moving to github makes it easy for people to cook their own
> versions
> > of piggybank if they want to -- they just have to fork the main master,
> and
> > apply changes as needed. The committers can pull in all, or some, of the
> > changes, if they are desirable. This puts such mutations in the public
> view,
> > as opposed to what's happening now, where they either don't happen, or
> > happen on people's unseen svn exports.
> >
> > Third, this allows contributions to piggybank for older version of pig.
> At
> > the moment, for example, there isn't really a way to contribute a Pig 0.6
> > loader -- the current svn trunk is on the new API, so such contributions
> > won't compile. Something could be contributed for a 0.6 branch, but that
> > won't see the light of day unless Pig team decides to do a 0.6.1 release,
> > which is highly unlikely and kind of a maintenance nightmare. This is
> why,
> > for example, my HBase loader changes wound up in Elephant-Bird instead of
> > Pig proper -- I didn't have a good way of getting them out there
> otherwise.
> > On github, we will be able to just keep a 0.6 branch that folks using
> that
> > version can keep moving.
> >
> > Bottom line is that we are sacrificing the benefits of a stately, strict
> > Apache workflow in order to gain agility and decrease barriers to
> > contribution. I personally feel that this is ok because piggybank is not
> so
> > much a software project as a collection of individual, distinct
> libraries.
> > It's kind of the CPAN of Pig, and no one versions all modules of CPAN in
> one
> > go -- the whole thing would get bogged down if that were to happen.
> Granted,
> > cpan lets you pull down specific versions of individual modules, and this
> > doesn't.. but let's take it one step at a time.
> >
> > I think the bit about Hive interoperation might be a bit overstated. The
> > observation was just that Hive has the same problem with user-defined
> > functions, and some common code might be reused since the two projects
> are
> > often used to achieve similar goals. So if the Hive people wanted to
> > collaborate on the common bits, and put their udfs into /hive while we
> put
> > ours into /pig, we agreed that would be a good thing. There is no intent,
> at
> > the moment, to build some generic udf interface that would allow one to
> > write udfs for both hive and pig at once. Though that would be cool.
> >
> > -Dmitriy
> >
> > On Sat, Aug 28, 2010 at 11:39 AM, Milind A Bhandarkar <
> mili...@yahoo-inc.com
> >> wrote:
> >
> >> +1 on the direction.
> >>
> >> A few questions:
> >>
> >> 1. With Pig marching towards becoming a TLP at Apache, can Piggybank
> become
> >> a full-fledged subproject (with it's own releases and all) ?
> >> 2. Or since the ultimate goal is to have a common UDF repository for
> both
> >> Pig and Hive, t would make sense to make it into an incubator project,
> with
> >> a name that does not indicate pig dependency?
> >> 3. I see parallels between Howl and proposed Piggybank, since they
> aspire
> >> to become common components in both Hive and Pig distributions. What are
> >> long term plans for Howl as far as hosting is concerned ?
> >>
> >> - Milind
> >>
> >> ________________________________________
> >> From: Dmitriy Ryaboy [dvrya...@gmail.com]
> >> Sent: Friday, August 27, 2010 2:13 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: Request for Comments: Piggybank future
> >>
> >> Hi folks, at the last Pig contributor meeting, the piggybank question
> was
> >> discussed -- namely, how to make it more easy to contribute to.
> >> (by the way, the contributor meetings are generally open to all comers
> --
> >> sign up for the pig-dev list if you are interested in that type of
> thing).
> >>
> >> Here's a section of the notes I sent to Pig-dev that documents the
> results
> >> of the piggybank discussion. How do you, as users, feel about this plan?
> >>
> >> Piggybank.
> >> Kevin Weil led a discussion of the piggybank. There are a few problems
> with
> >> it -- it's released on the Pig schedule, and has quite a few barriers to
> >> submission that are, anecdotally at least, preventing people from
> >> contributing. Several options were discussed, with the group finally
> >> settling on starting a community-curated GitHub project for piggybank.
> It
> >> will have a number of committers from different companies, and will aim
> to
> >> make it easy for folks to contribute (all contribs will still have to
> have
> >> tests, and be Apache 2.0-licensed). More details will be forthcoming as
> we
> >> figure them out. Initially this project will be seeded with the current
> >> Piggybank functions some time after 0.8 is branched. The initial list of
> >> committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl
> Steinbach
> >> (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate
> someone.
> >> Please send us any thoughts you might have on this subject. It was
> >> suggested
> >> that a lot of common code might be shared with Hive UDFs, which have the
> >> same problems as Piggybank does, and that perhaps the project can be
> >> another
> >> collaboration point between the projects. Not clear how that would work,
> >> Carl will talk to other Hive people.
> >>
>
>

Reply via email to