[ 
https://issues.apache.org/jira/browse/HDFS-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127631#comment-16127631
 ] 

Vinayakumar B commented on HDFS-11738:
--------------------------------------

Thanks [~jojochuang] for reviewing changes.
bq.  IIUC, the client would stuck in chooseDataNode() in such a scenario? 
Yeah, reader thread goes on retry until max retries, and gets 
{{BlockMissedException}}. But since this is a hedged read, already read would 
have completed with actual host. So read will completes successfully, but call 
will return to user only after all retries exhausted. Non-hedge case, read 
would fail. It was fixed in HDFS-11708.

bq. The method chooseDataNode should add a @Nullable to indicate a null return 
value is valid.
I tried to add @Nullable, but my IDE started showing some javadoc error. So 
added the whole javadoc mentioning about possible null return value. Hope that 
satisfies you.
 
bq. can be simplified as {{chosenNode = chooseDataNode(block, ignored, false);}}
Thats a good catch. changed.

{quote}The timeout of 30 seconds seems a little short. On my laptop this test 
takes approximately 20 seconds, so on a busy host the unit test might 
potentially run slightly over time. Or would it be reasonable to reduce some 
wait time?
E.g. reduce dfs.client.retry.window.base from 3000 to 1000?{quote}
Yeah, increased the timeout to 60000 and reduced the window time to 1000 as 
well. Thank you for the hint.

please check updated patch

> Hedged pread takes more time when block moved from initial locations
> --------------------------------------------------------------------
>
>                 Key: HDFS-11738
>                 URL: https://issues.apache.org/jira/browse/HDFS-11738
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFS-11738-01.patch, HDFS-11738-02.patch, 
> HDFS-11738-03.patch, HDFS-11738-04.patch
>
>
> Scenario : 
> Same as HDFS-11708.
> During Hedge read, 
> 1. First two locations fails to read the data in hedged mode.
> 2. chooseData refetches locations and adds a future to read from DN3.
> 3. after adding future to DN3, main thread goes for refetching locations in 
> loop and stucks there till all 3  retries to fetch locations exhausted, which 
> consumes ~20 seconds with exponential retry time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to