[ https://issues.apache.org/jira/browse/HBASE-15615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279915#comment-15279915 ]
Mikhail Antonov commented on HBASE-15615: ----------------------------------------- Thanks, and sorry for the delay here. Yeah, that semantic on branch-1 looks better, "num retries is the max number of times server will ever see your request", basically. Good catch in AsyncProcess in master, in branch-1 and branch-1.3 it's set correctly. I've looked the the all places where we either call ConnectionUtils.getPauseTime on branch-1.3, found few places I'd like us to check more. 1) In HTableMultiplexer we set this.workerConf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 0); that doesn't look right to me. TestHTableMultiplexer passes with this patch, but that makes me think that the only place where AsyncProcess really uses numRetries is in ServerErrorTracker, and we may not need this codepath? Could we have a test in TestHCM#testErrorBackoffTimeCalculation to make sure we test SET when someone passes in zero timeout / zero max retries? 2) in HBaseTestingUtility we use new RetryCounter(numRetries+1, (int)pause, TimeUnit.MICROSECONDS); - nit Otherwise looks good to me. > Wrong sleep time when RegionServerCallable need retry > ----------------------------------------------------- > > Key: HBASE-15615 > URL: https://issues.apache.org/jira/browse/HBASE-15615 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 1.0.0, 2.0.0, 1.1.0, 1.2.0, 1.3.0, 0.98.19 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Fix For: 1.3.0 > > Attachments: HBASE-15615-branch-0.98.patch, > HBASE-15615-branch-1.0-v2.patch, HBASE-15615-branch-1.1-v2.patch, > HBASE-15615-branch-1.1-v2.patch, HBASE-15615-branch-1.patch, > HBASE-15615-v1.patch, HBASE-15615-v1.patch, HBASE-15615-v2.patch, > HBASE-15615-v2.patch, HBASE-15615-v3.patch, HBASE-15615.patch > > > In RpcRetryingCallerImpl, it get pause time by expectedSleep = > callable.sleep(pause, tries + 1); And in RegionServerCallable, it get pasue > time by sleep = ConnectionUtils.getPauseTime(pause, tries + 1). So tries will > be bumped up twice. And the pasue time is 3 * hbase.client.pause when tries > is 0. > RETRY_BACKOFF = {1, 2, 3, 5, 10, 20, 40, 100, 100, 100, 100, 200, 200} -- This message was sent by Atlassian JIRA (v6.3.4#6332)