Which hbase Yunqing version? When it is hung, can you see which region
it fails on (Should be in the exception when the reduce fails). Can you
see why the reduce fails? Is it the TaskTracker timing it out after 10
minutes or is it timing out on a particular hbase region. If you can
figure the region, see which server its hosted on (Use UI or master logs
to figure this). Then go to that server, tail its logs. Can you figure
what its doing? Is it stuck? Thread dump it a few times and see if
you can see where its blocked -- you can thread dump the server from the
UI. When you thread dump via the UI, it also outputs into the hbase
regionserver log. Post them to this list if you'd like us to look at
them for you.
Thanks,
St.Ack
Zhou, Yunqing wrote:
I'm using IdentityTableReducer to insert about 1M records into hbase
on a 23 machines cluster.
but I found that sometimes it got junked.
everything suspended. all machine's work load are zeros.
then some reducer failed, new reducer begin to insert some records.
then the pheomenon appeared again.
I've set all machines nofile limit to 32768.
Do you know that's why?
Thanks.