If there are combiners, the reducers shouldn't get any lists longer than a small multiple of the number of maps.
On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens < [email protected]> wrote: > One of the problems with this data set is that I'm grouping by a > category that has only, say, 20 different values. Then I'm doing a > unique count of Facebook user IDs per group. I imagine that's not > pleasant for the reducers. > > On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[email protected]> wrote: > > Hi Bradford, > > > > Sometimes the reducers do not handle merging large chunks of data too > well: > > How many reducers do you have? Try to increase the # of reducers (you > can > > always merge the data later if you are worried about too many output > files). > > > > -- > > Alex Kozlov > > Solutions Architect > > Cloudera, Inc > > twitter: alexvk2009 > > > > Hadoop World 2010, October 12, New York City - Register now: > > http://www.cloudera.com/company/press-center/hadoop-world-nyc/ > > > > > > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[email protected]> > wrote: > > > >> Try using a lower threshold value (the num of values in the LRU to > cache). > >> this is the tradeoff of this approach. > >> > >> ckw > >> > >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: > >> > >> > Sadly, making Chris's changes didn't help. > >> > > >> > Here's the Cascading code, it's pretty simple but uses the new > >> > "combiner"-like functionality: > >> > > >> > http://pastebin.com/ccvDmLSX > >> > > >> > > >> > > >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]> > >> wrote: > >> >> My feeling is that you have some kind of leak going on in your > mappers > >> or > >> >> reducers and that reducing the number of times the jvm is re-used > would > >> >> improve matters. > >> >> > >> >> GC overhead limit indicates that your (tiny) heap is full and > collection > >> is > >> >> not reducing that. > >> >> > >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < > >> >> [email protected]> wrote: > >> >> > >> >>> mapred.job.reuse.jvm.num.tasks=50 > >> >>> > >> >> > >> > > >> > > >> > > >> > -- > >> > Bradford Stephens, > >> > Founder, Drawn to Scale > >> > drawntoscalehq.com > >> > 727.697.7528 > >> > > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> > solution. Process, store, query, search, and serve all your data. > >> > > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> > Media, and Computer Science > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > Groups > >> "cascading-user" group. > >> > To post to this group, send email to [email protected]. > >> > To unsubscribe from this group, send email to > >> [email protected]<cascading-user%[email protected]> > <cascading-user%[email protected]<cascading-user%[email protected]> > > > >> . > >> > For more options, visit this group at > >> http://groups.google.com/group/cascading-user?hl=en. > >> > > >> > >> -- > >> Chris K Wensel > >> [email protected] > >> http://www.concurrentinc.com > >> > >> -- Concurrent, Inc. offers mentoring, support, and licensing for > Cascading > >> > >> > > > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > > -- > You received this message because you are subscribed to the Google Groups > "cascading-user" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<cascading-user%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/cascading-user?hl=en. > >
