One of the problems with this data set is that I'm grouping by a category that has only, say, 20 different values. Then I'm doing a unique count of Facebook user IDs per group. I imagine that's not pleasant for the reducers.
On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[email protected]> wrote: > Hi Bradford, > > Sometimes the reducers do not handle merging large chunks of data too well: > How many reducers do you have? Try to increase the # of reducers (you can > always merge the data later if you are worried about too many output files). > > -- > Alex Kozlov > Solutions Architect > Cloudera, Inc > twitter: alexvk2009 > > Hadoop World 2010, October 12, New York City - Register now: > http://www.cloudera.com/company/press-center/hadoop-world-nyc/ > > > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[email protected]> wrote: > >> Try using a lower threshold value (the num of values in the LRU to cache). >> this is the tradeoff of this approach. >> >> ckw >> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: >> >> > Sadly, making Chris's changes didn't help. >> > >> > Here's the Cascading code, it's pretty simple but uses the new >> > "combiner"-like functionality: >> > >> > http://pastebin.com/ccvDmLSX >> > >> > >> > >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[email protected]> >> wrote: >> >> My feeling is that you have some kind of leak going on in your mappers >> or >> >> reducers and that reducing the number of times the jvm is re-used would >> >> improve matters. >> >> >> >> GC overhead limit indicates that your (tiny) heap is full and collection >> is >> >> not reducing that. >> >> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < >> >> [email protected]> wrote: >> >> >> >>> mapred.job.reuse.jvm.num.tasks=50 >> >>> >> >> >> > >> > >> > >> > -- >> > Bradford Stephens, >> > Founder, Drawn to Scale >> > drawntoscalehq.com >> > 727.697.7528 >> > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> > solution. Process, store, query, search, and serve all your data. >> > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social >> > Media, and Computer Science >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> "cascading-user" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> [email protected]<cascading-user%[email protected]> >> . >> > For more options, visit this group at >> http://groups.google.com/group/cascading-user?hl=en. >> > >> >> -- >> Chris K Wensel >> [email protected] >> http://www.concurrentinc.com >> >> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading >> >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
