Re: New module to share user functions

Brock Noland Wed, 26 Sep 2012 04:33:37 -0700

+1, I think this sounds like a great idea.

On Wed, Sep 26, 2012 at 12:07 AM, Rahul <[email protected]> wrote:
> Hi,
>
> I believe every project has a bunch of interesting users which can provide
> additional food for thought to others. Hadoop provides lots of random
> opportunities to people and the same should be possible with crunch. I would
> be delighted to see what people are able to pull off using the existing
> things. These contributions should be kept in crunch as we are pretty young
> and at times we will go under various refactorings, keeping them in crunch
> will keep them up-to date.
>
> And yes, +1 to the idea of keeping dependencies to crunch-core only.
>
> regards,
> rahul
>
> On 26-09-2012 04:32, Josh Wills wrote:
>>
>> I like the idea of having a place in the project that showcases the
>> cool things that you can do with it-- something more advanced and
>> broadly applicable than the starter pipelines we have in
>> crunch-examples, the kind of stuff that you can't easy do using tools
>> like Hive and Pig.
>>
>> I also agree that we don't want to get into dependency creep, so I'd
>> be inclined to limit crunch-bytes (crunch-berries? crunch-bars?
>> crunch-abs?) to just those dependencies that are also in crunch-core.
>> I think the Bloom Filter stuff meets this criteria.
>>
>> The project is still young enough that our problem is much more likely
>> to be attracting new folks than it is to be getting overwhelmed with
>> random contributions, so my inclination is to be welcoming.
>>
>> On Tue, Sep 25, 2012 at 11:29 AM, Matthias Friedrich <[email protected]> wrote:
>>>
>>> Hi Rahul,
>>>
>>> I think it would be really great to have an ecosystem of
>>> micro-libraries around Crunch for all kinds of cool stuff that is
>>> relevant for smaller audiences, just like your Bloom filters.
>>>
>>> But since I expect most of this stuff to be so extremely special, it
>>> would in my opinion make more sense to put this into small, focused
>>> and independent projects that can be released separately from each
>>> other and don't need to go through Crunch's review process. It would
>>> make dependency management easier for users, too, in case a library
>>> needs additional dependencies.
>>>
>>> We could maintain a registry of these projects on Crunch's homepage
>>> so people can find them easily (I expect most of them would end up
>>> at GitHub because it's perfect for this kind of thing). If a project
>>> turns out to be interesting for a larger audience, we can still add it
>>> to Crunch core.
>>>
>>> Regards,
>>>    Matthias
>>>
>>> On Tuesday, 2012-09-25, Rahul wrote:
>>>>
>>>> There can be interesting use-cases like BloomFilters which do not
>>>> have a place in the current set of Crunch modules. These functions
>>>> are kind of utility functions that can be used in Crunch. We need to
>>>> create a place where users can share such functions. In the earlier
>>>> discussion for BloomFilters we thought of some thing that is well
>>>> along the lines of PiggyBank. I had a look at the module but in
>>>> Pig's structure the module is branched under contrib module as there
>>>> are other modules like peeny for monitering and zebra for storage.
>>>>
>>>> I have created a module name *crunch-bytes* , for issue
>>>> https://issues.apache.org/jira/browse/CRUNCH-75, which is direct
>>>> sub-module in crunch-parent. I named it so because I felt it will
>>>> providing a space to have all those interesting data computations
>>>> that we can not have in core.
>>>>
>>>> Please share your thoughts for the same.
>>>>
>>>> regards,
>>>> rahul
>>>>
>>
>>
>




-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: New module to share user functions

Reply via email to