[ https://issues.apache.org/jira/browse/HDFS-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127631#comment-16127631 ]
Vinayakumar B commented on HDFS-11738: -------------------------------------- Thanks [~jojochuang] for reviewing changes. bq. IIUC, the client would stuck in chooseDataNode() in such a scenario? Yeah, reader thread goes on retry until max retries, and gets {{BlockMissedException}}. But since this is a hedged read, already read would have completed with actual host. So read will completes successfully, but call will return to user only after all retries exhausted. Non-hedge case, read would fail. It was fixed in HDFS-11708. bq. The method chooseDataNode should add a @Nullable to indicate a null return value is valid. I tried to add @Nullable, but my IDE started showing some javadoc error. So added the whole javadoc mentioning about possible null return value. Hope that satisfies you. bq. can be simplified as {{chosenNode = chooseDataNode(block, ignored, false);}} Thats a good catch. changed. {quote}The timeout of 30 seconds seems a little short. On my laptop this test takes approximately 20 seconds, so on a busy host the unit test might potentially run slightly over time. Or would it be reasonable to reduce some wait time? E.g. reduce dfs.client.retry.window.base from 3000 to 1000?{quote} Yeah, increased the timeout to 60000 and reduced the window time to 1000 as well. Thank you for the hint. please check updated patch > Hedged pread takes more time when block moved from initial locations > -------------------------------------------------------------------- > > Key: HDFS-11738 > URL: https://issues.apache.org/jira/browse/HDFS-11738 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Reporter: Vinayakumar B > Assignee: Vinayakumar B > Attachments: HDFS-11738-01.patch, HDFS-11738-02.patch, > HDFS-11738-03.patch, HDFS-11738-04.patch > > > Scenario : > Same as HDFS-11708. > During Hedge read, > 1. First two locations fails to read the data in hedged mode. > 2. chooseData refetches locations and adds a future to read from DN3. > 3. after adding future to DN3, main thread goes for refetching locations in > loop and stucks there till all 3 retries to fetch locations exhausted, which > consumes ~20 seconds with exponential retry time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org