[
https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876542#action_12876542
]
Jean-Daniel Cryans commented on HBASE-2691:
-------------------------------------------
The RS's session's expired, it reports back to the master right after that
(it's marked dead in the master) and trips into:
{code}
private void checkIsDead(final String serverName, final String what)
throws LeaseStillHeldException {
if (!isDead(serverName)) return;
LOG.debug("Server " + what + " rejected; currently processing " +
serverName + " as dead server");
throw new Leases.LeaseStillHeldException(serverName);
}
{code}
Which I see in the log. then on the HRS side this falls into:
{code}
} catch (Exception e) { // FindBugs REC_CATCH_EXCEPTION
if (e instanceof IOException) {
e = RemoteExceptionHandler.checkIOException((IOException) e);
}
tries++;
if (tries > 0 && (tries % this.numRetries) == 0) {
// Check filesystem every so often.
checkFileSystem();
}
if (this.stopRequested.get()) {
LOG.info("Stop requested, clearing toDo despite exception");
toDo.clear();
continue;
}
LOG.warn("Attempt=" + tries, e);
// No point retrying immediately; this is probably connection to
// master issue. Doing below will cause us to sleep.
lastMsg = System.currentTimeMillis();
{code}
Which throws the stack trace I pasted in this jira's description. IMO, and
taking into account the last comment in that code, we shouldn't retry. Instead,
we should catch LeaseStillHeldException separately from this big
catch(Exception) and treat it as an emergency shut down.
> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
> Key: HBASE-2691
> URL: https://issues.apache.org/jira/browse/HBASE-2691
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's
> useful so what happens right now is that it tries to report to the master and
> this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN [RegionServer:0]
> regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
> at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's
> expired! Instead, we should be a lot more proactive initiate abort procedure.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.