[
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113312#comment-13113312
]
subramanian raghunathan commented on HBASE-4462:
------------------------------------------------
What i observed in trunk code is HCM.getRegionServerWithRetries()
{code}
try {
callable.instantiateServer(tries != 0);
callable.beforeCall();
return callable.call();
} catch (Throwable t) {
callable.shouldRetry(t);
t = translateException(t);
exceptions.add(t);
if (tries == numRetries - 1) {
throw new RetriesExhaustedException(callable.getServerName(),
callable.getRegionName(), callable.getRow(), tries, exceptions);
}
} finally {
callable.afterCall();
}
public void shouldRetry(Throwable throwable) throws IOException {
if (this.callTimeout != HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT)
if (throwable instanceof SocketTimeoutException
|| (this.endTime - this.startTime > this.callTimeout)) {
throw (SocketTimeoutException) (SocketTimeoutException) new
SocketTimeoutException(
"Call to access row '" + Bytes.toString(row) + "' on table '"
+ Bytes.toString(tableName)
+ "' failed on socket timeout exception: " + throwable)
.initCause(throwable);
} else {
this.callTimeout = ((int) (this.endTime - this.startTime));
}
}
{code}
shouldRetry handles the SocketTimeoutException in a specific manner as such
theres no
retrytimes or period if its SocketTimeoutException and the exception is
immediately thrown back.
This is handled as a part of the defect {HBASE-2937:Facilitate Timeouts In
HBase Client}
But the same is not present in 0.90.x. Does the fix in HBASE-2937 and current
JIRA related ? If so can we backport ?
Please correct me if i am wrong some where.
> Properly treating SocketTimeoutException
> ----------------------------------------
>
> Key: HBASE-4462
> URL: https://issues.apache.org/jira/browse/HBASE-4462
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.90.4
> Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
>
> SocketTimeoutException is currently treated like any IOE inside of
> HCM.getRegionServerWithRetries and I think this is a problem. This method
> should only do retries in cases where we are pretty sure the operation will
> complete, but with STE we already waited for (by default) 60 seconds and
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list
> where it seemed like he was using the same scanner from multiple threads, but
> actually it was just the same client doing retries while the first run didn't
> even finish yet (that's another problem). You could see the first scanner,
> then up to two other handlers waiting for it to finish in order to run
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan,
> the issue with operations that modify a cell is that you don't know if the
> operation completed or not (same when a RS dies hard after completing let's
> say a Put but just before returning to the client).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira