[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453551#comment-13453551
 ] 

Jing Zhao commented on HDFS-3912:
---------------------------------

Suresh's comments in HDFS-3703:
bq. However for the write site, not picking the stale node could result in an 
issue, especially for small clusters. That is the reason why I think we should 
do the write side changes in a related jira. We should consider making stale 
timeout adaptive to the number of nodes marked stale in the cluster as 
discussed in the previous comments. Additionally we should consider having a 
separate configuration for write skipping the stale nodes.

The more detailed proposal for handling write is: 
For writes do not use stale datanodes (if possible). To avoid the scenario 
where a small T for judging stale state may generate new hotspots on cluster, T 
is proposed to be calculated as: 
T = t_c + (number of nodes already marked as stale) / (total number of nodes) * 
(T_d - t_c),
where t_c is a constant value initially set in the configuration, and T_d is 
the time for marking as dead (i.e., 10.5 min).

E.g., t_c can be set as 30s, then when there is no or few nodes marked as 
stale, we can have a small T to satisfy the HBase requirement. In case that 
there are large number nodes marked as stale, e.g., near the total number of 
nodes, T will be almost T_d (i.e., ~10min), and the workload can still be 
distributed to all the nodes alive.

When almost all nodes are marked as stale, include stale nodes as writing 
target candidates when the number of remaining normal alive nodes is less than 
the replica number.

                
> Detecting and avoiding stale datanodes for writing
> --------------------------------------------------
>
>                 Key: HDFS-3912
>                 URL: https://issues.apache.org/jira/browse/HDFS-3912
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>
> 1. Make stale timeout adaptive to the number of nodes marked stale in the 
> cluster.
> 2. Consider having a separate configuration for write skipping the stale 
> nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to