[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455262#comment-13455262
 ] 

nkeywal commented on HDFS-3912:
-------------------------------

Some thinking, with an HBase bias:
- if the datanode is too busy and cannot heartbeat in a minute, we will also 
get timeouts when writing the blocks (if the datanode is dead: 20s connect 
timeout. If it's not dead, or if we had previously a connection, we will fail 
on the read timeout for the ack, it's around 1 minute by default).
- the recovery is on the critical path, so going to a suspicious node is not 
something you want to do.
- things are already quite complicated, so I think I would end up with the same 
value for read & write to keep them simple.

Then there is the case when many nodes are staled. I think we're in a really 
bad shape at this stage... I feel that just throwing an exception is the best 
solution. HBase would wait a few seconds and retry. That's better for the 
cluster than trying a node that is unlikely to execute the write. But it's a 
kind of change vs. today's behavior.

To synthesis, this could make sense imho:
- there are enough fully alive nodes: let's use them, whatever the number of 
stale nodes.
- there are not enough fully alive nodes, but there are some stale nodes that 
we could use: let's use the stale nodes them, at least the behavior will be 
backward compatible.
- there are not enough live node: as today.

                
> Detecting and avoiding stale datanodes for writing
> --------------------------------------------------
>
>                 Key: HDFS-3912
>                 URL: https://issues.apache.org/jira/browse/HDFS-3912
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>
> 1. Make stale timeout adaptive to the number of nodes marked stale in the 
> cluster.
> 2. Consider having a separate configuration for write skipping the stale 
> nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to