[jira] [Updated] (HDFS-13711) Avoid using timeout datanodes for block replication

Anbang Hu (JIRA) Fri, 29 Jun 2018 17:36:19 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anbang Hu updated HDFS-13711:
-----------------------------
    Attachment: HDFS-13711.000.patch
        Status: Patch Available  (was: Open)

> Avoid using timeout datanodes for block replication
> ---------------------------------------------------
>
>                 Key: HDFS-13711
>                 URL: https://issues.apache.org/jira/browse/HDFS-13711
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Anbang Hu
>            Assignee: Anbang Hu
>            Priority: Major
>         Attachments: HDFS-13711.000.patch
>
>
> For block replication, there is randomization in selecting source datanode in 
> {{BlockManager.chooseSourceDatanodes}} to avoid always choosing the same 
> datanode. 
> To reduce replication failure rate further, one option we can do is to 
> remember which datanodes were previously tried on but timed out, next time 
> block replication will choose to try on other datanodes. The list of timeout 
> datanodes should be reset when all datanodes are exhausted. This is just one 
> example of choosing "better" sources. We can easily have other criteria for 
> choosing sources: avoiding high xceiver nodes, etc.. So the improvement 
> should be designed as generic as possible to accept other criteria.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-13711) Avoid using timeout datanodes for block replication

Reply via email to