[
https://issues.apache.org/jira/browse/HDFS-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041856#comment-14041856
]
Liang Xie commented on HDFS-6591:
---------------------------------
Yes, it's a valid bug report, just verified on my box.[~cnauroth] The fix from
HDFS-6231 has a minor side effect(the original behavior is countDown only after
a successful actualGetFromOneDataNode call): since we moved the latch to finaly
statement. then once a actualGetFromOneDataNode() failed, we still fire
"latch.countDown();" , then getFirstToComplete() will pass the "latch.await"
after that, then goes to "thorw new InterruptedException", so in
hedgedFetchBlockByteRange(), we will continue to try to do
getBestNodeDNAddrPair(), and in the test case, the replica is two and due to
the previous failed actualGetFromOneDataNode() so one dn had been added into
deadnode list:) so it looks like the loop in hedgedFetchBlockByteRange() is:
{code}
while(true) {
getBestNodeDNAddrPair();
getFirstToComplete();
catch exception...
}
{code}
I'll make a patch soon, thank you [~liulei.cn] !
> while loop is executed tens of thousands of times in Hedged Read
> ------------------------------------------------------------------
>
> Key: HDFS-6591
> URL: https://issues.apache.org/jira/browse/HDFS-6591
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.4.0
> Reporter: LiuLei
> Assignee: Liang Xie
> Attachments: LoopTooManyTimesTestCase.patch
>
>
> I download hadoop-2.4.1-rc1 code from
> http://people.apache.org/~acmurthy/hadoop-2.4.1-rc1/, I test the Hedged
> Read. I find the while loop in hedgedFetchBlockByteRange method is executed
> tens of thousands of times.
--
This message was sent by Atlassian JIRA
(v6.2#6252)