I really like this idea. I'd like to see more sharing of udfs out in
the open.
What barriers to submission are removed by this move? How does a udf
make it into piggybank now vs. before?
Sent from my iPhone
On Aug 27, 2010, at 3:13 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
Hi folks, at the last Pig contributor meeting, the piggybank
question was
discussed -- namely, how to make it more easy to contribute to.
(by the way, the contributor meetings are generally open to all
comers --
sign up for the pig-dev list if you are interested in that type of
thing).
Here's a section of the notes I sent to Pig-dev that documents the
results
of the piggybank discussion. How do you, as users, feel about this
plan?
Piggybank.
Kevin Weil led a discussion of the piggybank. There are a few
problems with
it -- it's released on the Pig schedule, and has quite a few
barriers to
submission that are, anecdotally at least, preventing people from
contributing. Several options were discussed, with the group finally
settling on starting a community-curated GitHub project for
piggybank. It
will have a number of committers from different companies, and will
aim to
make it easy for folks to contribute (all contribs will still have
to have
tests, and be Apache 2.0-licensed). More details will be forthcoming
as we
figure them out. Initially this project will be seeded with the
current
Piggybank functions some time after 0.8 is branched. The initial
list of
committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl
Steinbach
(Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate
someone.
Please send us any thoughts you might have on this subject. It was
suggested
that a lot of common code might be shared with Hive UDFs, which have
the
same problems as Piggybank does, and that perhaps the project can be
another
collaboration point between the projects. Not clear how that would
work,
Carl will talk to other Hive people.