DFS write pipeline : DFSClient sometimes does not detect second datanode 
failure 
---------------------------------------------------------------------------------

                 Key: HADOOP-3416
                 URL: https://issues.apache.org/jira/browse/HADOOP-3416
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.16.0
            Reporter: Raghu Angadi




When the first datanode's write to second datanode fails or times out DFSClient 
ends up marking first datanode as the bad one and removes it from the pipeline. 
Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From 
HADOOP-3339 : 

"The main issue is that BlockReceiver thread (and DataStreamer in the case of 
DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
coarse control. We don't know what state the responder is in and interrupting 
has different effects depending on responder state. To fix this properly we 
need to redesign how we handle these interactions."

When the first datanode closes its socket from DFSClient, DFSClient should 
properly read all the data left in the socket.. Also, DataNode's closing of the 
socket should not result in a TCP reset, otherwise I think DFSClient will not 
be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to