[hbase] When the master times out a region servers lease, the region server may
not restart
-------------------------------------------------------------------------------------------
Key: HADOOP-2173
URL: https://issues.apache.org/jira/browse/HADOOP-2173
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: Jim Kellerman
Hadoop-Nightly 297 failed because:
* The region server's lease expired (Why? was the heartbeat thread starved?)
* The region server gets a call startup message
* The master splits the region server's log and deletes it.
I think that when the region server called log.closeAndDelete(), it got an
exception (because the file no longer existed) at that point it said "error
restarting server" and quit. From there on the master is just looping because
there is no region server to talk to
We should probably just log an error for log.closeAndDelete() and proceed with
region server restart.
Also for that test, we should probably increase the lease timeout and make the
lease timeout check happen less frequently accordingly
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.