If there are combiners, the reducers shouldn't get any lists longer than a
small multiple of the number of maps.

On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens <
[email protected]> wrote:

> One of the problems with this data set is that I'm grouping by a
> category that has only, say, 20 different values. Then I'm doing a
> unique count of Facebook user IDs per group. I imagine that's not
> pleasant for the reducers.
>
> On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[email protected]> wrote:
> > Hi Bradford,
> >
> > Sometimes the reducers do not handle merging large chunks of data too
> well:
> > How many reducers do you have?  Try to increase the # of reducers (you
> can
> > always merge the data later if you are worried about too many output
> files).
> >
> > --
> > Alex Kozlov
> > Solutions Architect
> > Cloudera, Inc
> > twitter: alexvk2009
> >
> > Hadoop World 2010, October 12, New York City - Register now:
> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/
> >
> >
> > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[email protected]>
> wrote:
> >
> >> Try using a lower threshold value (the num of values in the LRU to
> cache).
> >> this is the tradeoff of this approach.
> >>
> >> ckw
> >>
> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
> >>
> >> > Sadly, making Chris's changes didn't help.
> >> >
> >> > Here's the Cascading code, it's pretty simple but uses the new
> >> > "combiner"-like functionality:
> >> >
> >> > http://pastebin.com/ccvDmLSX
> >> >
> >> >
> >> >
> >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]>
> >> wrote:
> >> >> My feeling is that you have some kind of leak going on in your
> mappers
> >> or
> >> >> reducers and that reducing the number of times the jvm is re-used
> would
> >> >> improve matters.
> >> >>
> >> >> GC overhead limit indicates that your (tiny) heap is full and
> collection
> >> is
> >> >> not reducing that.
> >> >>
> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
> >> >> [email protected]> wrote:
> >> >>
> >> >>> mapred.job.reuse.jvm.num.tasks=50
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Bradford Stephens,
> >> > Founder, Drawn to Scale
> >> > drawntoscalehq.com
> >> > 727.697.7528
> >> >
> >> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> > solution. Process, store, query, search, and serve all your data.
> >> >
> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> > Media, and Computer Science
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> Groups
> >> "cascading-user" group.
> >> > To post to this group, send email to [email protected].
> >> > To unsubscribe from this group, send email to
> >> [email protected]<cascading-user%[email protected]>
> <cascading-user%[email protected]<cascading-user%[email protected]>
> >
> >> .
> >> > For more options, visit this group at
> >> http://groups.google.com/group/cascading-user?hl=en.
> >> >
> >>
> >> --
> >> Chris K Wensel
> >> [email protected]
> >> http://www.concurrentinc.com
> >>
> >> -- Concurrent, Inc. offers mentoring, support, and licensing for
> Cascading
> >>
> >>
> >
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<cascading-user%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/cascading-user?hl=en.
>
>

Reply via email to