[
https://issues.apache.org/jira/browse/HADOOP-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545737
]
Jim Kellerman commented on HADOOP-2173:
---------------------------------------
This should be fixed with the commit for HADOOP-2276. Leaving open in case
other other circumstances also exhibit this bug.
> [hbase] When the master times out a region servers lease, the region server
> may not restart
> -------------------------------------------------------------------------------------------
>
> Key: HADOOP-2173
> URL: https://issues.apache.org/jira/browse/HADOOP-2173
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
>
> Hadoop-Nightly 297 failed because:
> * The region server's lease expired (Why? was the heartbeat thread
> starved?)
> * The region server gets a call startup message
> * The master splits the region server's log and deletes it.
> I think that when the region server called log.closeAndDelete(), it got an
> exception (because the file no longer existed) at that point it said "error
> restarting server" and quit. From there on the master is just looping because
> there is no region server to talk to
> We should probably just log an error for log.closeAndDelete() and proceed
> with region server restart.
> Also for that test, we should probably increase the lease timeout and make
> the lease timeout check happen less frequently accordingly
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.