[ 
https://issues.apache.org/jira/browse/HDFS-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041856#comment-14041856
 ] 

Liang Xie commented on HDFS-6591:
---------------------------------

Yes, it's a valid bug report, just verified on my box.[~cnauroth] The fix from 
HDFS-6231 has a minor side effect(the original behavior is countDown only after 
a successful actualGetFromOneDataNode call): since we moved the latch to finaly 
statement. then once a actualGetFromOneDataNode() failed, we still fire 
"latch.countDown();" , then getFirstToComplete() will pass the "latch.await" 
after that, then goes to "thorw new InterruptedException", so in 
hedgedFetchBlockByteRange(), we will continue to try to do 
getBestNodeDNAddrPair(), and in the test case, the replica is two and due to 
the previous failed actualGetFromOneDataNode() so one dn had been added into 
deadnode list:) so it looks like the loop in hedgedFetchBlockByteRange() is:
{code}
while(true) {
getBestNodeDNAddrPair();
getFirstToComplete();
catch exception...
}
{code}

I'll make a patch soon, thank you [~liulei.cn] !

> while loop is executed tens of thousands of times  in Hedged  Read
> ------------------------------------------------------------------
>
>                 Key: HDFS-6591
>                 URL: https://issues.apache.org/jira/browse/HDFS-6591
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.4.0
>            Reporter: LiuLei
>            Assignee: Liang Xie
>         Attachments: LoopTooManyTimesTestCase.patch
>
>
> I download hadoop-2.4.1-rc1 code from 
> http://people.apache.org/~acmurthy/hadoop-2.4.1-rc1/,  I test the  Hedged  
> Read. I find the while loop in hedgedFetchBlockByteRange method is executed 
> tens of thousands of times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to