+1, I think this sounds like a great idea. On Wed, Sep 26, 2012 at 12:07 AM, Rahul <[email protected]> wrote: > Hi, > > I believe every project has a bunch of interesting users which can provide > additional food for thought to others. Hadoop provides lots of random > opportunities to people and the same should be possible with crunch. I would > be delighted to see what people are able to pull off using the existing > things. These contributions should be kept in crunch as we are pretty young > and at times we will go under various refactorings, keeping them in crunch > will keep them up-to date. > > And yes, +1 to the idea of keeping dependencies to crunch-core only. > > regards, > rahul > > On 26-09-2012 04:32, Josh Wills wrote: >> >> I like the idea of having a place in the project that showcases the >> cool things that you can do with it-- something more advanced and >> broadly applicable than the starter pipelines we have in >> crunch-examples, the kind of stuff that you can't easy do using tools >> like Hive and Pig. >> >> I also agree that we don't want to get into dependency creep, so I'd >> be inclined to limit crunch-bytes (crunch-berries? crunch-bars? >> crunch-abs?) to just those dependencies that are also in crunch-core. >> I think the Bloom Filter stuff meets this criteria. >> >> The project is still young enough that our problem is much more likely >> to be attracting new folks than it is to be getting overwhelmed with >> random contributions, so my inclination is to be welcoming. >> >> On Tue, Sep 25, 2012 at 11:29 AM, Matthias Friedrich <[email protected]> wrote: >>> >>> Hi Rahul, >>> >>> I think it would be really great to have an ecosystem of >>> micro-libraries around Crunch for all kinds of cool stuff that is >>> relevant for smaller audiences, just like your Bloom filters. >>> >>> But since I expect most of this stuff to be so extremely special, it >>> would in my opinion make more sense to put this into small, focused >>> and independent projects that can be released separately from each >>> other and don't need to go through Crunch's review process. It would >>> make dependency management easier for users, too, in case a library >>> needs additional dependencies. >>> >>> We could maintain a registry of these projects on Crunch's homepage >>> so people can find them easily (I expect most of them would end up >>> at GitHub because it's perfect for this kind of thing). If a project >>> turns out to be interesting for a larger audience, we can still add it >>> to Crunch core. >>> >>> Regards, >>> Matthias >>> >>> On Tuesday, 2012-09-25, Rahul wrote: >>>> >>>> There can be interesting use-cases like BloomFilters which do not >>>> have a place in the current set of Crunch modules. These functions >>>> are kind of utility functions that can be used in Crunch. We need to >>>> create a place where users can share such functions. In the earlier >>>> discussion for BloomFilters we thought of some thing that is well >>>> along the lines of PiggyBank. I had a look at the module but in >>>> Pig's structure the module is branched under contrib module as there >>>> are other modules like peeny for monitering and zebra for storage. >>>> >>>> I have created a module name *crunch-bytes* , for issue >>>> https://issues.apache.org/jira/browse/CRUNCH-75, which is direct >>>> sub-module in crunch-parent. I named it so because I felt it will >>>> providing a space to have all those interesting data computations >>>> that we can not have in core. >>>> >>>> Please share your thoughts for the same. >>>> >>>> regards, >>>> rahul >>>> >> >> >
-- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
