[
https://issues.apache.org/jira/browse/HADOOP-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589267#action_12589267
]
Raghu Angadi commented on HADOOP-3124:
--------------------------------------
Yes. This patch lowers the value to 8 min. I think 2 min is too short, because
1 min leads to multiple false errors on the cluster I am using for HADOOP-3132.
Currently we have this timeout only to catch rare exceptions. I made sure that
there is no changes to any logic in the patch other than using regular sockets
when the timeout is 0. This is good for 0.17.
> DFS data node should not use hard coded 10 minutes as write timeout.
> --------------------------------------------------------------------
>
> Key: HADOOP-3124
> URL: https://issues.apache.org/jira/browse/HADOOP-3124
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Raghu Angadi
> Fix For: 0.18.0
>
> Attachments: HADOOP-3124.patch, HADOOP-3124.patch
>
>
> This problem happens in 0.17 trunk
> I saw reducers waited 10 minutes for writing data to dfs and got timeout.
> The client retries again and timeouted after another 19 minutes.
> After looking into the code, it seems that the dfs data node uses 10 minutes
> as timeout for wtiting data into the data node pipeline.
> I thing we have three issues:
> 1. The 10 minutes timeout value is too big for writing a chunk of data (64K)
> through the data node pipeline.
> 2. The timeout value should not be hard coded.
> 3. Different datanodes in a pipeline should use different timeout values for
> writing to the downstream.
> A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
> For example, if the replication factor is 3, the client uses 60 secs, the
> first data node use 40 secs, the second datanode use 20secs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.