[ http://issues.apache.org/jira/browse/HADOOP-698?page=all ]
Milind Bhandarkar updated HADOOP-698: ------------------------------------- Attachment: datanode-exclude.patch Patch attached. It also include two unit tests. > When DFS client fails to read from a datanode, the failed datanode is not > excluded from target reselection > ---------------------------------------------------------------------------------------------------------- > > Key: HADOOP-698 > URL: http://issues.apache.org/jira/browse/HADOOP-698 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Hairong Kuang > Assigned To: Milind Bhandarkar > Attachments: datanode-exclude.patch > > > In the method read(byte buf[ ], int off, int len) of DFSInputStream, when > read fails, it calls "blockSeekTo" to reselect a datanode. However, the > failed datanode does not feed back to blockSeekTo. The datanode selection > algorithm works as follows: > * If the machine that the client is running on has a local copy, return the > local machine; > * Otherwise, randomly pick up one location. > When the failed data node info does not feed back to target reselection, this > leads to two flaws: > 1. When a client fails to read from the local copy, for example, because of > the checksum error, the local machine will always be chosen in retries. > 2. Random selection may still return the same failed node. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira