[
https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047945#comment-16047945
]
Wei-Chiu Chuang commented on HDFS-11303:
----------------------------------------
Hi [~zhangchen], thanks for filing the jira and posting the patch.
I'd like to help reviewing the patch. The fix itself looks good. Here's my
quick comments, mostly cosmetic:
1. I am not so sure about the test: you mentioned the client had timeouts
connecting to DNs, but the test throws ChecksumException -- it is not clear to
me if client exhibit the same symptom in both scenarios. Perhaps you can use
{{DFSClientFaultInjector.readFromDatanodeDelay}} to insert a delay instead?
2. Could you add test timeout limit? For example a 10 second timeout:
{{\@Test(timeout=100000)}}
3. It appears to me that the test expects to throw BlockMissingException.
Instead of the following:
{code}
} catch (BlockMissingException e) {
assertTrue(true);
}
{code} Would you mind update the test and use ExpectedException to assert that
the exception is expected?
4.
{code}
if (true) {
System.out.println("-------------- throw Checksum Exception");
throw new ChecksumException("ChecksumException test", 100);
}
{code}
Please remove {{if(true)}} and please use DFSClient.LOG instead of
System.out.println for log printing
5. There's a slight code conflict due to HDFS-11708. Please rebase the patch.
> Hedged read might hang infinitely if read data from all DN failed
> ------------------------------------------------------------------
>
> Key: HDFS-11303
> URL: https://issues.apache.org/jira/browse/HDFS-11303
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 3.0.0-alpha1
> Reporter: Chen Zhang
> Assignee: Chen Zhang
> Attachments: HDFS-11303-001.patch, HDFS-11303-001.patch,
> HDFS-11303-002.patch, HDFS-11303-002.patch
>
>
> Hedged read will read from a DN first, if timeout, then read other DNs
> simultaneously.
> If read all DN failed, this bug will cause the future-list not empty(the
> first timeout request left in list), and hang in the loop infinitely
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]