[ http://issues.apache.org/jira/browse/HADOOP-698?page=all ]

Milind Bhandarkar updated HADOOP-698:
-------------------------------------

    Attachment: datanode-exclude.patch

Patch attached. It also include two unit tests.

> When DFS client fails to read from a datanode, the failed datanode is not 
> excluded from target reselection
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-698
>                 URL: http://issues.apache.org/jira/browse/HADOOP-698
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>         Assigned To: Milind Bhandarkar
>         Attachments: datanode-exclude.patch
>
>
> In the method read(byte buf[ ], int off, int len) of DFSInputStream, when 
> read fails,  it calls "blockSeekTo" to reselect a datanode. However, the 
> failed datanode does not feed back to blockSeekTo. The datanode selection 
> algorithm works as follows:
> * If the machine that the client is running on has a local copy, return the 
> local machine;
> * Otherwise, randomly pick up one location.
> When the failed data node info does not feed back to target reselection, this 
> leads to two flaws:
> 1. When a client fails to read from the local copy, for example, because of 
> the checksum error, the local machine will always be chosen in retries.
> 2. Random selection may still return the same failed node.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to