[ 
https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708437#action_12708437
 ] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

> when createOutputStream fails, a dfs client should take the failed datanode 
> out of the pipeline, bump the block's ge

@Hairong: This was purposely *not done* when we did the 
client-streaming-data-to-datanodes. The reason being that when you do this, you 
reduce the robustness of the block. You would remember that when a replica in 
the pipeline fails, the client continues writing to the other replicas and the 
NN makes no attempt to increase that's block's replication factor until the 
file is closed. This means that when we remove a datanode from a pipeline, we 
are exposing that block to a larger probability of going "missing or corrupt". 
This situation is unavoidable when the client has written partial data to a 
block and then encounters an error in the pipeline, in this case we ignore the 
bad datanode and continue with the remainder of the datanode(s). 

On the other hand, when the createOutputStream fails, we have the luxury of 
ignoring all the datanode inthe current pipeline because the client has not yet 
written any data to any of the datanodes in the pipeline. We could have ignored 
only the bad datanode (as you suggested), but this means that pipeline would be 
exposed to a higher probability of encountering a "missing/corrupt" block if 
the other two replicas also fail sometime in the near future before the file is 
closed. In this case, we can remove this degradation if we fetch an entirely 
new pipeline from the NN.

@Alban: Increasing the number of write retries in that case won't help. 

I understand your use-case now. The NN takes 10 minutes of no-heartbeats from a 
datanode to declare it dead. It is possible for you to set 
dfs.client.block.write.retries to a value that causes the client to retry for 
more than 10 minutes? In that case, your test case should succeed. The idea is 
that if the client does not bail out (but keeps retrying) for more than 10 
minutes, it is bound to succeed. Please let us know.

I will also look at your patch in greater detail.




> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write 
> fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block 
> blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block 
> blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block 
> blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream 
> java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block 
> blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: 
> java.io.IOException:
> Unable to create new block.
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block 
> blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the 
> time a data node goes down and the time it is marked as dead by the name 
> node. This delay is unavoidable, but the name node should not keep allocating 
> new blocks to data nodes that are known to be down by the client. Even by 
> adjusting {{heartbeat.recheck.interval}}, there is still a window during 
> which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes 
> when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to