I'm pretty excited about this. This removes all the pain of contributing UDFs.
Russ On Tue, Aug 31, 2010 at 7:39 AM, Corbin Hoenes <cor...@tynt.com> wrote: > All sounds reasonable thanks for explaining the thought process. > > On Aug 29, 2010, at 3:11 PM, Dmitriy Ryaboy wrote: > > > Hi folks, > > > > I'll try to address both Corbin's and Milind's questions. This is just my > > opinion, I'm open to criticism/suggestions/corrections. > > > > There are several barriers that are being removed. > > > > First, piggybank will no longer be bound to the pig release schedule. At > the > > moment, I am not sure there will be "releases" of piggybank, as such -- > we > > might just tag snapshots with their own git branches and move on. This > > allows the code to develop at a much faster pace, while possibly > sacrificing > > some of the stability and permanence of Apache-style releases. I feel > that > > this is ok, as piggybank was always subject to less stringent testing, > and > > the attitude towards it has long been "it might work, and you might have > to > > tweak it if it doesn't". > > > > Second, moving to github makes it easy for people to cook their own > versions > > of piggybank if they want to -- they just have to fork the main master, > and > > apply changes as needed. The committers can pull in all, or some, of the > > changes, if they are desirable. This puts such mutations in the public > view, > > as opposed to what's happening now, where they either don't happen, or > > happen on people's unseen svn exports. > > > > Third, this allows contributions to piggybank for older version of pig. > At > > the moment, for example, there isn't really a way to contribute a Pig 0.6 > > loader -- the current svn trunk is on the new API, so such contributions > > won't compile. Something could be contributed for a 0.6 branch, but that > > won't see the light of day unless Pig team decides to do a 0.6.1 release, > > which is highly unlikely and kind of a maintenance nightmare. This is > why, > > for example, my HBase loader changes wound up in Elephant-Bird instead of > > Pig proper -- I didn't have a good way of getting them out there > otherwise. > > On github, we will be able to just keep a 0.6 branch that folks using > that > > version can keep moving. > > > > Bottom line is that we are sacrificing the benefits of a stately, strict > > Apache workflow in order to gain agility and decrease barriers to > > contribution. I personally feel that this is ok because piggybank is not > so > > much a software project as a collection of individual, distinct > libraries. > > It's kind of the CPAN of Pig, and no one versions all modules of CPAN in > one > > go -- the whole thing would get bogged down if that were to happen. > Granted, > > cpan lets you pull down specific versions of individual modules, and this > > doesn't.. but let's take it one step at a time. > > > > I think the bit about Hive interoperation might be a bit overstated. The > > observation was just that Hive has the same problem with user-defined > > functions, and some common code might be reused since the two projects > are > > often used to achieve similar goals. So if the Hive people wanted to > > collaborate on the common bits, and put their udfs into /hive while we > put > > ours into /pig, we agreed that would be a good thing. There is no intent, > at > > the moment, to build some generic udf interface that would allow one to > > write udfs for both hive and pig at once. Though that would be cool. > > > > -Dmitriy > > > > On Sat, Aug 28, 2010 at 11:39 AM, Milind A Bhandarkar < > mili...@yahoo-inc.com > >> wrote: > > > >> +1 on the direction. > >> > >> A few questions: > >> > >> 1. With Pig marching towards becoming a TLP at Apache, can Piggybank > become > >> a full-fledged subproject (with it's own releases and all) ? > >> 2. Or since the ultimate goal is to have a common UDF repository for > both > >> Pig and Hive, t would make sense to make it into an incubator project, > with > >> a name that does not indicate pig dependency? > >> 3. I see parallels between Howl and proposed Piggybank, since they > aspire > >> to become common components in both Hive and Pig distributions. What are > >> long term plans for Howl as far as hosting is concerned ? > >> > >> - Milind > >> > >> ________________________________________ > >> From: Dmitriy Ryaboy [dvrya...@gmail.com] > >> Sent: Friday, August 27, 2010 2:13 PM > >> To: pig-user@hadoop.apache.org > >> Subject: Request for Comments: Piggybank future > >> > >> Hi folks, at the last Pig contributor meeting, the piggybank question > was > >> discussed -- namely, how to make it more easy to contribute to. > >> (by the way, the contributor meetings are generally open to all comers > -- > >> sign up for the pig-dev list if you are interested in that type of > thing). > >> > >> Here's a section of the notes I sent to Pig-dev that documents the > results > >> of the piggybank discussion. How do you, as users, feel about this plan? > >> > >> Piggybank. > >> Kevin Weil led a discussion of the piggybank. There are a few problems > with > >> it -- it's released on the Pig schedule, and has quite a few barriers to > >> submission that are, anecdotally at least, preventing people from > >> contributing. Several options were discussed, with the group finally > >> settling on starting a community-curated GitHub project for piggybank. > It > >> will have a number of committers from different companies, and will aim > to > >> make it easy for folks to contribute (all contribs will still have to > have > >> tests, and be Apache 2.0-licensed). More details will be forthcoming as > we > >> figure them out. Initially this project will be seeded with the current > >> Piggybank functions some time after 0.8 is branched. The initial list of > >> committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl > Steinbach > >> (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate > someone. > >> Please send us any thoughts you might have on this subject. It was > >> suggested > >> that a lot of common code might be shared with Hive UDFs, which have the > >> same problems as Piggybank does, and that perhaps the project can be > >> another > >> collaboration point between the projects. Not clear how that would work, > >> Carl will talk to other Hive people. > >> > >