Dave Shine , Can you share how many data is been taken by map task .If map task is uneven then it might be Hot Spotting Problem. Have an look on http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ . I had also faced same problem i am trying implement this HbaseWD.
Thanks and Regards, S SYED ABDUL KATHER * * On Fri, Jul 20, 2012 at 6:50 PM, Dave Shine < dave.sh...@channelintelligence.com> wrote: > I have a job that is emitting over 3 billion rows from the map to the > reduce. The job is configured with 43 reduce tasks. A perfectly even > distribution would amount to about 70 million rows per reduce task. > However I actually got around 60 million for most of the tasks, one task > got over 100 million, and one task got almost 350 million. This uneven > distribution caused the job to run exceedingly long.**** > > ** ** > > I believe this is referred to as a “key skew problem”, which I know is > heavily dependent on the actual data being processed. Can anyone point me > to any blog posts, white papers, etc. that might give me some options on > how to deal with this issue? **** > > ** ** > > Thanks,**** > > *Dave Shine***** > > Sr. Software Engineer**** > > 321.939.5093 direct | 407.314.0122 mobile**** > > ** ** > > [image: cid:D34AFA33-EA7B-4B08-9DD4-2C8DFBE66338]**** > > *CI Boost™ Clients* *Outperform Online™ *www.ciboost.com**** > > facebook platform | where-to-buy | product search engines | shopping > engines**** > > ** ** > > ** ** > > ------------------------------ > The information contained in this email message is considered confidential > and proprietary to the sender and is intended solely for review and use by > the named recipient. Any unauthorized review, use or distribution is > strictly prohibited. If you have received this message in error, please > advise the sender by reply email and delete the message. >
<<image001.png>>