[
https://issues.apache.org/jira/browse/HBASE-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562212#comment-14562212
]
Hudson commented on HBASE-13732:
--------------------------------
FAILURE: Integrated in HBase-1.1 #506 (See
[https://builds.apache.org/job/HBase-1.1/506/])
HBASE-13732 TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
(Stephen Yuan Jiang) (enis: rev ccf5556e1f02d5e22f287f2bdbbc83b258f49eaf)
*
hbase-common/src/main/java/org/apache/hadoop/hbase/util/RetryCounterFactory.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
> TestHBaseFsck#testParallelWithRetriesHbck fails intermittently
> --------------------------------------------------------------
>
> Key: HBASE-13732
> URL: https://issues.apache.org/jira/browse/HBASE-13732
> Project: HBase
> Issue Type: Bug
> Components: hbck, test
> Affects Versions: 2.0.0, 1.1.0, 1.2.0
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
> Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.1.1
>
> Attachments: HBASE-13732.patch
>
>
> TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially
> in Windows environment) with "java.io.IOException: Duplicate hbck - Abort"
> {noformat}
> java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck
> - Abort
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644)
> Caused by: java.io.IOException: Duplicate hbck - Abort
> at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43)
> at
> org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635)
> at
> org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}
> HBASE-13591 tried to address this issue. It did improve the pass rate in
> Linux environment (after the fix, I could not repro in my machine); however,
> the test still failed intermittently in Windows environment during testing of
> 1.1 release.
> Looking at the code, it uses the ExponentialBackoffPolicy (starting with
> 200ms sleep time after first failed attempt to acquire the lock in ZK, then
> 400ms, then 800ms, etc.) in between retries. Therefore, even the first hbck
> run completes, the second hbck run would still fail due to long sleep time.
> the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit
> and cap the max sleep time to some small number (eg. 5 seconds, it should be
> configurable). This would make the test more robust.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)