[ https://issues.apache.org/jira/browse/HDFS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uma Maheswara Rao G resolved HDFS-2378. --------------------------------------- Resolution: Duplicate Since the patch in HDFS-2637 already got +1. After discussing with Todd, this can be duplicated. Marking it as duplicate of HDFS-2637. > recoverBlock timeout in DFSClient should be longer > -------------------------------------------------- > > Key: HDFS-2378 > URL: https://issues.apache.org/jira/browse/HDFS-2378 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.23.0, 1.1.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Critical > Fix For: 0.24.0 > > > In a failure scenario when one of the datanodes in a pipeline has "frozen" > (eg hard swapping or disk controller issues) we sometimes see timeouts in the > call to recoverBlock(). This is because recoverBlock's implementation sends > several RPCs internally (to the NN and to other nodes in the pipeline) with > the same timeout. Since the timeouts are equal, the "outer" call times out > first. The retry then fails since recovery is already in progress, or already > finished. > The best fix would be to make recoverBlock idempotent so the retry doesn't > fail, but in the absence of that we can likely fix this issue by increasing > the timeout to be equal to the sum of the timeouts of the underlying recovery > calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira