[
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360593#comment-15360593
]
Konstantin Ryakhovskiy commented on HBASE-14422:
------------------------------------------------
[~stack] the issue reproduced with additional Thread.sleep(..) before
latch.await():
I have added this Thread.sleep(..) to simulate bad enough hardware, like a long
context switch.
the log:
https://builds.apache.org/job/PreCommit-HBASE-Build/2506/testReport/org.apache.hadoop.hbase.client/TestFastFailWithoutTestUtil/testPreemptiveFastFailException50Times/
at the iteration #8 (see line Time-limited test #7) Thread2 is in FastFail mode
(TT-2 difference=1), it means, that when the code is in the method
PreemptiveFastFailInterceptor#inFastFail(), then
EnvironmentEdge.currentTimeMillis is 1ms greater than (time of the first
failure + fast fail threshold).
To make the test more robust, we can increment done counter without
verification, so, instead of line:
if (pffe) done.incrementAndGet();
we can write directly:
done.incrementAndGet();
will that work from your perspective?
> Fix TestFastFailWithoutTestUtil
> -------------------------------
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
> Issue Type: Task
> Components: test
> Reporter: stack
> Assignee: Konstantin Ryakhovskiy
> Priority: Minor
> Labels: beginner
> Attachments: HBASE-14422.master.001.patch,
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch,
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch,
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch,
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch,
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch,
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch,
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch,
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does
> testInterceptorIntercept50Times Usually it passes but on occasion, the
> latching between thread 1 and thread 2 goes awry and the test hangs and the
> test hangs out. Depends on the hardware but it seems to happen about one in
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and
> then fixing it so the test doesn't have to have the latch timeout. Hopefully
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it
> beginner anyways.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)