Barry:
From the below, looks like an issue in HDFS. If regionserver is
having issues talking to HDFS, it shuts itself down.
Tell us more. Are there other, heavy-duty processes running on the same
servers hosting datanodes and regionservers?
Enable DEBUG on your cluster and makes sure you've set your ulimit file
descriptors up from default. See the FAQ in wiki for how to do both.
Thanks,
St.Ack
Barry Haddow wrote:
Hi
I recently set up a small hbase cluster (v 0.18) running on top of hadoop
v.0.18.1. However I'm observing that the region servers spontaneously shut
themselves down, usually with an UnknownScannerException. For instance, this
weekend, I discovered that all four had shut down, with messages like the
following in the logs:
2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 129.215.197.39:50010
2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block
blk_-5829206400135277905_3045
2008-09-29 07:29:16,552 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
2008-09-29 07:46:35,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler
4 on 60020, call next(-1347145425990165691) from 129.215.197.39:6999: error:
org.apache.hadoop.hbase.UnknownScannerException: Name: -1347145425990165691
The underlying hdfs seems fine - fsck reports the hbase directory as healthy.
After a restart hbase seems fine too, but surely the regionservers should
stay up once they're started,
Any suggestions?
regards
Barry