Re: Request for Comments: Piggybank future

Corbin Hoenes Tue, 31 Aug 2010 07:40:00 -0700

All sounds reasonable thanks for explaining the thought process.

On Aug 29, 2010, at 3:11 PM, Dmitriy Ryaboy wrote:


> Hi folks,
> 
> I'll try to address both Corbin's and Milind's questions. This is just my
> opinion, I'm open to criticism/suggestions/corrections.
> 
> There are several barriers that are being removed.
> 
> First, piggybank will no longer be bound to the pig release schedule. At the
> moment, I am not sure there will be "releases" of piggybank, as such -- we
> might just tag snapshots with their own git branches and move on. This
> allows the code to develop at a much faster pace, while possibly sacrificing
> some of the stability and permanence of Apache-style releases. I feel that
> this is ok, as piggybank was always subject to less stringent testing, and
> the attitude towards it has long been "it might work, and you might have to
> tweak it if it doesn't".
> 
> Second, moving to github makes it easy for people to cook their own versions
> of piggybank if they want to -- they just have to fork the main master, and
> apply changes as needed. The committers can pull in all, or some, of the
> changes, if they are desirable. This puts such mutations in the public view,
> as opposed to what's happening now, where they either don't happen, or
> happen on people's unseen svn exports.
> 
> Third, this allows contributions to piggybank for older version of pig. At
> the moment, for example, there isn't really a way to contribute a Pig 0.6
> loader -- the current svn trunk is on the new API, so such contributions
> won't compile. Something could be contributed for a 0.6 branch, but that
> won't see the light of day unless Pig team decides to do a 0.6.1 release,
> which is highly unlikely and kind of a maintenance nightmare. This is why,
> for example, my HBase loader changes wound up in Elephant-Bird instead of
> Pig proper -- I didn't have a good way of getting them out there otherwise.
> On github, we will be able to just keep a 0.6 branch that folks using that
> version can keep moving.
> 
> Bottom line is that we are sacrificing the benefits of a stately, strict
> Apache workflow in order to gain agility and decrease barriers to
> contribution. I personally feel that this is ok because piggybank is not so
> much a software project as a collection of individual, distinct libraries.
> It's kind of the CPAN of Pig, and no one versions all modules of CPAN in one
> go -- the whole thing would get bogged down if that were to happen. Granted,
> cpan lets you pull down specific versions of individual modules, and this
> doesn't.. but let's take it one step at a time.
> 
> I think the bit about Hive interoperation might be a bit overstated. The
> observation was just that Hive has the same problem with user-defined
> functions, and some common code might be reused since the two projects are
> often used to achieve similar goals. So if the Hive people wanted to
> collaborate on the common bits, and put their udfs into /hive while we put
> ours into /pig, we agreed that would be a good thing. There is no intent, at
> the moment, to build some generic udf interface that would allow one to
> write udfs for both hive and pig at once. Though that would be cool.
> 
> -Dmitriy
> 
> On Sat, Aug 28, 2010 at 11:39 AM, Milind A Bhandarkar <mili...@yahoo-inc.com
>> wrote:
> 
>> +1 on the direction.
>> 
>> A few questions:
>> 
>> 1. With Pig marching towards becoming a TLP at Apache, can Piggybank become
>> a full-fledged subproject (with it's own releases and all) ?
>> 2. Or since the ultimate goal is to have a common UDF repository for both
>> Pig and Hive, t would make sense to make it into an incubator project, with
>> a name that does not indicate pig dependency?
>> 3. I see parallels between Howl and proposed Piggybank, since they aspire
>> to become common components in both Hive and Pig distributions. What are
>> long term plans for Howl as far as hosting is concerned ?
>> 
>> - Milind
>> 
>> ________________________________________
>> From: Dmitriy Ryaboy [dvrya...@gmail.com]
>> Sent: Friday, August 27, 2010 2:13 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Request for Comments: Piggybank future
>> 
>> Hi folks, at the last Pig contributor meeting, the piggybank question was
>> discussed -- namely, how to make it more easy to contribute to.
>> (by the way, the contributor meetings are generally open to all comers --
>> sign up for the pig-dev list if you are interested in that type of thing).
>> 
>> Here's a section of the notes I sent to Pig-dev that documents the results
>> of the piggybank discussion. How do you, as users, feel about this plan?
>> 
>> Piggybank.
>> Kevin Weil led a discussion of the piggybank. There are a few problems with
>> it -- it's released on the Pig schedule, and has quite a few barriers to
>> submission that are, anecdotally at least, preventing people from
>> contributing. Several options were discussed, with the group finally
>> settling on starting a community-curated GitHub project for piggybank. It
>> will have a number of committers from different companies, and will aim to
>> make it easy for folks to contribute (all contribs will still have to have
>> tests, and be Apache 2.0-licensed). More details will be forthcoming as we
>> figure them out. Initially this project will be seeded with the current
>> Piggybank functions some time after 0.8 is branched. The initial list of
>> committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
>> (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
>> Please send us any thoughts you might have on this subject. It was
>> suggested
>> that a lot of common code might be shared with Hive UDFs, which have the
>> same problems as Piggybank does, and that perhaps the project can be
>> another
>> collaboration point between the projects. Not clear how that would work,
>> Carl will talk to other Hive people.
>>

Re: Request for Comments: Piggybank future

Reply via email to