It turned out to be a deployment issue of an old version. Ted and
Chris's suggestions were spot-on.

I can't believe how BRILLIANT these combiners from Cascading are. It's
cut my processing time down from 20 hours to 50 minutes. AND I cut out
about 80% of my hand-crafted code.

Bravo. I look smart now. (Almost).

-B

On Sun, Sep 26, 2010 at 7:00 PM, Ted Dunning <[email protected]> wrote:
> If there are combiners, the reducers shouldn't get any lists longer than a
> small multiple of the number of maps.
>
> On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens <
> [email protected]> wrote:
>
>> One of the problems with this data set is that I'm grouping by a
>> category that has only, say, 20 different values. Then I'm doing a
>> unique count of Facebook user IDs per group. I imagine that's not
>> pleasant for the reducers.
>>
>> On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[email protected]> wrote:
>> > Hi Bradford,
>> >
>> > Sometimes the reducers do not handle merging large chunks of data too
>> well:
>> > How many reducers do you have?  Try to increase the # of reducers (you
>> can
>> > always merge the data later if you are worried about too many output
>> files).
>> >
>> > --
>> > Alex Kozlov
>> > Solutions Architect
>> > Cloudera, Inc
>> > twitter: alexvk2009
>> >
>> > Hadoop World 2010, October 12, New York City - Register now:
>> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/
>> >
>> >
>> > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[email protected]>
>> wrote:
>> >
>> >> Try using a lower threshold value (the num of values in the LRU to
>> cache).
>> >> this is the tradeoff of this approach.
>> >>
>> >> ckw
>> >>
>> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>> >>
>> >> > Sadly, making Chris's changes didn't help.
>> >> >
>> >> > Here's the Cascading code, it's pretty simple but uses the new
>> >> > "combiner"-like functionality:
>> >> >
>> >> > http://pastebin.com/ccvDmLSX
>> >> >
>> >> >
>> >> >
>> >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]>
>> >> wrote:
>> >> >> My feeling is that you have some kind of leak going on in your
>> mappers
>> >> or
>> >> >> reducers and that reducing the number of times the jvm is re-used
>> would
>> >> >> improve matters.
>> >> >>
>> >> >> GC overhead limit indicates that your (tiny) heap is full and
>> collection
>> >> is
>> >> >> not reducing that.
>> >> >>
>> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>> >> >> [email protected]> wrote:
>> >> >>
>> >> >>> mapred.job.reuse.jvm.num.tasks=50
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Bradford Stephens,
>> >> > Founder, Drawn to Scale
>> >> > drawntoscalehq.com
>> >> > 727.697.7528
>> >> >
>> >> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> >> > solution. Process, store, query, search, and serve all your data.
>> >> >
>> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> >> > Media, and Computer Science
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> Groups
>> >> "cascading-user" group.
>> >> > To post to this group, send email to [email protected].
>> >> > To unsubscribe from this group, send email to
>> >> [email protected]<cascading-user%[email protected]>
>> <cascading-user%[email protected]<cascading-user%[email protected]>
>> >
>> >> .
>> >> > For more options, visit this group at
>> >> http://groups.google.com/group/cascading-user?hl=en.
>> >> >
>> >>
>> >> --
>> >> Chris K Wensel
>> >> [email protected]
>> >> http://www.concurrentinc.com
>> >>
>> >> -- Concurrent, Inc. offers mentoring, support, and licensing for
>> Cascading
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cascading-user" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<cascading-user%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/cascading-user?hl=en.
>>
>>
>



-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Reply via email to