[ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113312#comment-13113312
 ] 

subramanian raghunathan commented on HBASE-4462:
------------------------------------------------

What i observed in trunk code is HCM.getRegionServerWithRetries()

{code}

   try {
          callable.instantiateServer(tries != 0);
          callable.beforeCall();
          return callable.call();
        } catch (Throwable t) {
          callable.shouldRetry(t);
          t = translateException(t);
          exceptions.add(t);
          if (tries == numRetries - 1) {
            throw new RetriesExhaustedException(callable.getServerName(),
                callable.getRegionName(), callable.getRow(), tries, exceptions);
          }
        } finally {
          callable.afterCall();
        }


  public void shouldRetry(Throwable throwable) throws IOException {
    if (this.callTimeout != HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT)
      if (throwable instanceof SocketTimeoutException
          || (this.endTime - this.startTime > this.callTimeout)) {
        throw (SocketTimeoutException) (SocketTimeoutException) new 
SocketTimeoutException(
            "Call to access row '" + Bytes.toString(row) + "' on table '"
                + Bytes.toString(tableName)
                + "' failed on socket timeout exception: " + throwable)
            .initCause(throwable);
      } else {
        this.callTimeout = ((int) (this.endTime - this.startTime));
      }
  }

{code}

shouldRetry handles the SocketTimeoutException in a specific manner as such 
theres no 

retrytimes or period if its SocketTimeoutException  and the exception is 
immediately thrown back.

This is handled as a part of the defect {HBASE-2937:Facilitate Timeouts In 
HBase Client}

But the same is not present in 0.90.x. Does the fix in HBASE-2937 and current 
JIRA related ?  If so can we backport ?

Please correct me if i am wrong some where.

> Properly treating SocketTimeoutException
> ----------------------------------------
>
>                 Key: HBASE-4462
>                 URL: https://issues.apache.org/jira/browse/HBASE-4462
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>
> SocketTimeoutException is currently treated like any IOE inside of 
> HCM.getRegionServerWithRetries and I think this is a problem. This method 
> should only do retries in cases where we are pretty sure the operation will 
> complete, but with STE we already waited for (by default) 60 seconds and 
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list 
> where it seemed like he was using the same scanner from multiple threads, but 
> actually it was just the same client doing retries while the first run didn't 
> even finish yet (that's another problem). You could see the first scanner, 
> then up to two other handlers waiting for it to finish in order to run 
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the 
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, 
> the issue with operations that modify a cell is that you don't know if the 
> operation completed or not (same when a RS dies hard after completing let's 
> say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to