Hi,

I believe every project has a bunch of interesting users which can provide additional food for thought to others. Hadoop provides lots of random opportunities to people and the same should be possible with crunch. I would be delighted to see what people are able to pull off using the existing things. These contributions should be kept in crunch as we are pretty young and at times we will go under various refactorings, keeping them in crunch will keep them up-to date.

And yes, +1 to the idea of keeping dependencies to crunch-core only.

regards,
rahul
On 26-09-2012 04:32, Josh Wills wrote:
I like the idea of having a place in the project that showcases the
cool things that you can do with it-- something more advanced and
broadly applicable than the starter pipelines we have in
crunch-examples, the kind of stuff that you can't easy do using tools
like Hive and Pig.

I also agree that we don't want to get into dependency creep, so I'd
be inclined to limit crunch-bytes (crunch-berries? crunch-bars?
crunch-abs?) to just those dependencies that are also in crunch-core.
I think the Bloom Filter stuff meets this criteria.

The project is still young enough that our problem is much more likely
to be attracting new folks than it is to be getting overwhelmed with
random contributions, so my inclination is to be welcoming.

On Tue, Sep 25, 2012 at 11:29 AM, Matthias Friedrich <[email protected]> wrote:
Hi Rahul,

I think it would be really great to have an ecosystem of
micro-libraries around Crunch for all kinds of cool stuff that is
relevant for smaller audiences, just like your Bloom filters.

But since I expect most of this stuff to be so extremely special, it
would in my opinion make more sense to put this into small, focused
and independent projects that can be released separately from each
other and don't need to go through Crunch's review process. It would
make dependency management easier for users, too, in case a library
needs additional dependencies.

We could maintain a registry of these projects on Crunch's homepage
so people can find them easily (I expect most of them would end up
at GitHub because it's perfect for this kind of thing). If a project
turns out to be interesting for a larger audience, we can still add it
to Crunch core.

Regards,
   Matthias

On Tuesday, 2012-09-25, Rahul wrote:
There can be interesting use-cases like BloomFilters which do not
have a place in the current set of Crunch modules. These functions
are kind of utility functions that can be used in Crunch. We need to
create a place where users can share such functions. In the earlier
discussion for BloomFilters we thought of some thing that is well
along the lines of PiggyBank. I had a look at the module but in
Pig's structure the module is branched under contrib module as there
are other modules like peeny for monitering and zebra for storage.

I have created a module name *crunch-bytes* , for issue
https://issues.apache.org/jira/browse/CRUNCH-75, which is direct
sub-module in crunch-parent. I named it so because I felt it will
providing a space to have all those interesting data computations
that we can not have in core.

Please share your thoughts for the same.

regards,
rahul




Reply via email to