[ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360593#comment-15360593
 ] 

Konstantin Ryakhovskiy commented on HBASE-14422:
------------------------------------------------

[~stack] the issue reproduced with additional Thread.sleep(..) before 
latch.await():
I have added this Thread.sleep(..) to simulate bad enough hardware, like a long 
context switch.
the log: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2506/testReport/org.apache.hadoop.hbase.client/TestFastFailWithoutTestUtil/testPreemptiveFastFailException50Times/
at the iteration #8 (see line Time-limited test #7) Thread2 is in FastFail mode 
(TT-2 difference=1), it means, that when the code is in the method 
PreemptiveFastFailInterceptor#inFastFail(), then 
EnvironmentEdge.currentTimeMillis is 1ms greater than (time of the first 
failure + fast fail threshold).

To make the test more robust, we can increment done counter without 
verification, so, instead of line:
if (pffe) done.incrementAndGet();
we can write directly:
done.incrementAndGet();

will that work from your perspective?


> Fix TestFastFailWithoutTestUtil
> -------------------------------
>
>                 Key: HBASE-14422
>                 URL: https://issues.apache.org/jira/browse/HBASE-14422
>             Project: HBase
>          Issue Type: Task
>          Components: test
>            Reporter: stack
>            Assignee: Konstantin Ryakhovskiy
>            Priority: Minor
>              Labels: beginner
>         Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to