Which hbase Yunqing version? When it is hung, can you see which region it fails on (Should be in the exception when the reduce fails). Can you see why the reduce fails? Is it the TaskTracker timing it out after 10 minutes or is it timing out on a particular hbase region. If you can figure the region, see which server its hosted on (Use UI or master logs to figure this). Then go to that server, tail its logs. Can you figure what its doing? Is it stuck? Thread dump it a few times and see if you can see where its blocked -- you can thread dump the server from the UI. When you thread dump via the UI, it also outputs into the hbase regionserver log. Post them to this list if you'd like us to look at them for you.

Thanks,
St.Ack

Zhou, Yunqing wrote:
I'm using IdentityTableReducer to insert about 1M records into hbase
on a 23 machines cluster.
but I found that sometimes it got junked.
everything suspended. all machine's work load are zeros.

then some reducer failed, new reducer begin to insert some records.
then the pheomenon appeared again.

I've set all machines nofile limit to 32768.
Do you know that's why?
Thanks.

Reply via email to