[
https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056006#comment-14056006
]
Liang Xie commented on HDFS-6631:
---------------------------------
I see, compared with my dev box logfile, found inside attached
"org.apache.hadoop.hdfs.TestPread-output.txt" file, it did not trigger a real
hedged read.
I only could find log like "Waited 50ms to read from 127.0.0.1:xxxxx spawning
hedged read" in my logfile.
In your file, the execution sequence is :
->read from 127.0.0.1:53908 <- here the counter is 1
->throw Checksum Exception
->read from 127.0.0.1:53919 <- here the counter is 2
->return result , line 1127
that means all two read path gone to the "if (futures.isEmpty()) {" flow
(L1112)
so the root question is if we set hedged.read.threshold = 50ms, and
Mockito.doAnswer has a "Thread.sleep(50+1)", this statement:
{code}
Future<ByteBuffer> future = hedgedService.poll(
dfsClient.getHedgedReadTimeout(), TimeUnit.MILLISECONDS);
{code}
In my dev box, it did just like Javadoc says:
{code}
Retrieves and removes the Future representing the next completed task, waiting
if necessary up to the
specified wait time if none are yet present.
Parameters:
timeout how long to wait before giving up, in units of unit
unit a TimeUnit determining how to interpret the timeout parameter
Returns:
the Future representing the next completed task or null if the
specified waiting time elapses before
one is present
Throws:
InterruptedException - if interrupted while waiting
{code}
so the future will be null.
but in Chris's box, the exception from thread pool will jump out firstly, so
gone to L1140 directly: "catch (ExecutionException e)"
so per my current understanding, it should be related with os thread schedule
(granularity) , we probably need to enlarge the Mockito sleep interval.
> TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
> --------------------------------------------------------------
>
> Key: HDFS-6631
> URL: https://issues.apache.org/jira/browse/HDFS-6631
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client, test
> Affects Versions: 3.0.0, 2.5.0
> Reporter: Chris Nauroth
> Attachments: org.apache.hadoop.hdfs.TestPread-output.txt
>
>
> {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks
> like a race condition on counting the expected number of loop iterations. I
> can repro the test failure more consistently on Windows.
--
This message was sent by Atlassian JIRA
(v6.2#6252)