[ 
https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056006#comment-14056006
 ] 

Liang Xie commented on HDFS-6631:
---------------------------------

I see, compared with my dev box logfile, found inside attached 
"org.apache.hadoop.hdfs.TestPread-output.txt" file, it did not trigger a real 
hedged read.
I only could find  log like "Waited 50ms to read from 127.0.0.1:xxxxx spawning 
hedged read" in my logfile.
In your file, the execution sequence is :
->read from 127.0.0.1:53908                     <- here the counter is 1
->throw Checksum Exception
->read from 127.0.0.1:53919                     <- here the counter is 2
->return result ,  line 1127
that means all two read path gone to the "if (futures.isEmpty()) {" flow  
(L1112)

so the root question is if we set hedged.read.threshold = 50ms, and 
Mockito.doAnswer has a "Thread.sleep(50+1)", this statement:
{code}
          Future<ByteBuffer> future = hedgedService.poll(
              dfsClient.getHedgedReadTimeout(), TimeUnit.MILLISECONDS);
{code}

In my dev box, it did just like Javadoc says:
{code}
Retrieves and removes the Future representing the next completed task, waiting 
if necessary up to the 
 specified wait time if none are yet present.
Parameters:
        timeout how long to wait before giving up, in units of unit
        unit a TimeUnit determining how to interpret the timeout parameter
Returns:
         the Future representing the next completed task or null if the 
specified waiting time elapses before 
          one is present
Throws:
        InterruptedException - if interrupted while waiting
{code}

so the future will be null.

but in Chris's box, the exception from thread pool will jump out firstly, so 
gone to L1140 directly: "catch (ExecutionException e)"

so per my current understanding, it should be related with os thread schedule 
(granularity) , we probably need to enlarge the Mockito sleep interval.

> TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
> --------------------------------------------------------------
>
>                 Key: HDFS-6631
>                 URL: https://issues.apache.org/jira/browse/HDFS-6631
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, test
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Chris Nauroth
>         Attachments: org.apache.hadoop.hdfs.TestPread-output.txt
>
>
> {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently.  It looks 
> like a race condition on counting the expected number of loop iterations.  I 
> can repro the test failure more consistently on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to