All sounds reasonable thanks for explaining the thought process. On Aug 29, 2010, at 3:11 PM, Dmitriy Ryaboy wrote:
> Hi folks, > > I'll try to address both Corbin's and Milind's questions. This is just my > opinion, I'm open to criticism/suggestions/corrections. > > There are several barriers that are being removed. > > First, piggybank will no longer be bound to the pig release schedule. At the > moment, I am not sure there will be "releases" of piggybank, as such -- we > might just tag snapshots with their own git branches and move on. This > allows the code to develop at a much faster pace, while possibly sacrificing > some of the stability and permanence of Apache-style releases. I feel that > this is ok, as piggybank was always subject to less stringent testing, and > the attitude towards it has long been "it might work, and you might have to > tweak it if it doesn't". > > Second, moving to github makes it easy for people to cook their own versions > of piggybank if they want to -- they just have to fork the main master, and > apply changes as needed. The committers can pull in all, or some, of the > changes, if they are desirable. This puts such mutations in the public view, > as opposed to what's happening now, where they either don't happen, or > happen on people's unseen svn exports. > > Third, this allows contributions to piggybank for older version of pig. At > the moment, for example, there isn't really a way to contribute a Pig 0.6 > loader -- the current svn trunk is on the new API, so such contributions > won't compile. Something could be contributed for a 0.6 branch, but that > won't see the light of day unless Pig team decides to do a 0.6.1 release, > which is highly unlikely and kind of a maintenance nightmare. This is why, > for example, my HBase loader changes wound up in Elephant-Bird instead of > Pig proper -- I didn't have a good way of getting them out there otherwise. > On github, we will be able to just keep a 0.6 branch that folks using that > version can keep moving. > > Bottom line is that we are sacrificing the benefits of a stately, strict > Apache workflow in order to gain agility and decrease barriers to > contribution. I personally feel that this is ok because piggybank is not so > much a software project as a collection of individual, distinct libraries. > It's kind of the CPAN of Pig, and no one versions all modules of CPAN in one > go -- the whole thing would get bogged down if that were to happen. Granted, > cpan lets you pull down specific versions of individual modules, and this > doesn't.. but let's take it one step at a time. > > I think the bit about Hive interoperation might be a bit overstated. The > observation was just that Hive has the same problem with user-defined > functions, and some common code might be reused since the two projects are > often used to achieve similar goals. So if the Hive people wanted to > collaborate on the common bits, and put their udfs into /hive while we put > ours into /pig, we agreed that would be a good thing. There is no intent, at > the moment, to build some generic udf interface that would allow one to > write udfs for both hive and pig at once. Though that would be cool. > > -Dmitriy > > On Sat, Aug 28, 2010 at 11:39 AM, Milind A Bhandarkar <mili...@yahoo-inc.com >> wrote: > >> +1 on the direction. >> >> A few questions: >> >> 1. With Pig marching towards becoming a TLP at Apache, can Piggybank become >> a full-fledged subproject (with it's own releases and all) ? >> 2. Or since the ultimate goal is to have a common UDF repository for both >> Pig and Hive, t would make sense to make it into an incubator project, with >> a name that does not indicate pig dependency? >> 3. I see parallels between Howl and proposed Piggybank, since they aspire >> to become common components in both Hive and Pig distributions. What are >> long term plans for Howl as far as hosting is concerned ? >> >> - Milind >> >> ________________________________________ >> From: Dmitriy Ryaboy [dvrya...@gmail.com] >> Sent: Friday, August 27, 2010 2:13 PM >> To: pig-user@hadoop.apache.org >> Subject: Request for Comments: Piggybank future >> >> Hi folks, at the last Pig contributor meeting, the piggybank question was >> discussed -- namely, how to make it more easy to contribute to. >> (by the way, the contributor meetings are generally open to all comers -- >> sign up for the pig-dev list if you are interested in that type of thing). >> >> Here's a section of the notes I sent to Pig-dev that documents the results >> of the piggybank discussion. How do you, as users, feel about this plan? >> >> Piggybank. >> Kevin Weil led a discussion of the piggybank. There are a few problems with >> it -- it's released on the Pig schedule, and has quite a few barriers to >> submission that are, anecdotally at least, preventing people from >> contributing. Several options were discussed, with the group finally >> settling on starting a community-curated GitHub project for piggybank. It >> will have a number of committers from different companies, and will aim to >> make it easy for folks to contribute (all contribs will still have to have >> tests, and be Apache 2.0-licensed). More details will be forthcoming as we >> figure them out. Initially this project will be seeded with the current >> Piggybank functions some time after 0.8 is branched. The initial list of >> committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach >> (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone. >> Please send us any thoughts you might have on this subject. It was >> suggested >> that a lot of common code might be shared with Hive UDFs, which have the >> same problems as Piggybank does, and that perhaps the project can be >> another >> collaboration point between the projects. Not clear how that would work, >> Carl will talk to other Hive people. >>