[
https://issues.apache.org/jira/browse/HDFS-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044929#comment-14044929
]
Chris Nauroth commented on HDFS-6591:
-------------------------------------
[~liulei.cn], thank you for the bug report.
[~xieliang007], nice work on the patch. This is a pretty tricky one! I think
the logic is easier to follow now using {{CompletionService}} instead of
{{CountDownLatch}}.
I have a few comments and questions.
# {{DFSInputStream#hedgedFetchBlockByteRange}}: I have a question on the
following code, which runs when there is no need for a hedged read. (The
initial read completes before the hedged read timeout.)
{code}
future = hedgedService.poll(dfsClient.getHedgedReadTimeout(),
TimeUnit.MILLISECONDS);
if (future != null) {
future.get();
return;
}
{code}
The return value from {{future.get()}} is discarded, so it appears that we
never copy to {{buf}} in this code path. I only see a copy to {{buf}} inside
the else path, when a hedged read actually happens. Did I miss something?
# {{DFSHedgedReadMetrics}}: Is it actually useful to add the new metric for
HedgedReadOpsLoopNum, or was it just helpful for catching this bug? If it's
not going to be useful in general, then maybe we should look for a different
way to accommodate the test rather than publishing a new metric.
# {{TestPread}}: Let's set {{isHedgedRead = false}} in a JUnit {{After}} method
instead of at the end of individual methods. That way, if one test throws an
exception and leaves the flag on by mistake, it won't disrupt the other tests
in the same suite. Also, it seems this was done wrong for
{{testMaxOutHedgedReadPool}}, which is setting it to {{true}} at the end.
Using an {{After}} method would prevent those kinds of problems.
# {{TestPread#testHedgedReadLoopTooManyTimes}}: Let's guarantee that {{output}}
and {{input}} get closed. We can put the cleanup in a finally block and call
{{IOUtils#cleanup}}.
> while loop is executed tens of thousands of times in Hedged Read
> ------------------------------------------------------------------
>
> Key: HDFS-6591
> URL: https://issues.apache.org/jira/browse/HDFS-6591
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.4.0
> Reporter: LiuLei
> Assignee: Liang Xie
> Attachments: HDFS-6591-v2.txt, HDFS-6591.txt,
> LoopTooManyTimesTestCase.patch
>
>
> I download hadoop-2.4.1-rc1 code from
> http://people.apache.org/~acmurthy/hadoop-2.4.1-rc1/, I test the Hedged
> Read. I find the while loop in hedgedFetchBlockByteRange method is executed
> tens of thousands of times.
--
This message was sent by Atlassian JIRA
(v6.2#6252)