[
https://issues.apache.org/jira/browse/HBASE-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562042#comment-14562042
]
Enis Soztutar commented on HBASE-13732:
---------------------------------------
bq. if the caller does not change the default maxSleepTime (default is -1), it
just behaves like ExponentialBackoffPolicy. So we should be OK for any other
existing callers or any new callers who wish to still use
ExponentialBackoffPolicy.
ok, sounds good.
bq. possible dup of HBASE-13574?
Seems that this patch is better than the one in HBASE-13574.
I'll commit shortly.
> TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
> --------------------------------------------------------------
>
> Key: HBASE-13732
> URL: https://issues.apache.org/jira/browse/HBASE-13732
> Project: HBase
> Issue Type: Bug
> Components: hbck, test
> Affects Versions: 2.0.0, 1.1.0, 1.2.0
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
> Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.1.1
>
> Attachments: HBASE-13732.patch
>
>
> TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially
> in Windows environment) with "java.io.IOException: Duplicate hbck - Abort"
> {noformat}
> java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck
> - Abort
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644)
> Caused by: java.io.IOException: Duplicate hbck - Abort
> at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}
> HBASE-13591 tried to address this issue. It did improve the pass rate in
> Linux environment (after the fix, I could not repro in my machine); however,
> the test still failed intermittently in Windows environment during testing of
> 1.1 release.
> Looking at the code, it uses the ExponentialBackoffPolicy (starting with
> 200ms sleep time after first failed attempt to acquire the lock in ZK, then
> 400ms, then 800ms, etc.) in between retries. Therefore, even the first hbck
> run completes, the second hbck run would still fail due to long sleep time.
> the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit
> and cap the max sleep time to some small number (eg. 5 seconds, it should be
> configurable). This would make the test more robust.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)