[
https://issues.apache.org/jira/browse/HDFS-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anbang Hu updated HDFS-13711:
-----------------------------
Attachment: HDFS-13711.000.patch
Status: Patch Available (was: Open)
> Avoid using timeout datanodes for block replication
> ---------------------------------------------------
>
> Key: HDFS-13711
> URL: https://issues.apache.org/jira/browse/HDFS-13711
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Anbang Hu
> Assignee: Anbang Hu
> Priority: Major
> Attachments: HDFS-13711.000.patch
>
>
> For block replication, there is randomization in selecting source datanode in
> {{BlockManager.chooseSourceDatanodes}} to avoid always choosing the same
> datanode.
> To reduce replication failure rate further, one option we can do is to
> remember which datanodes were previously tried on but timed out, next time
> block replication will choose to try on other datanodes. The list of timeout
> datanodes should be reset when all datanodes are exhausted. This is just one
> example of choosing "better" sources. We can easily have other criteria for
> choosing sources: avoiding high xceiver nodes, etc.. So the improvement
> should be designed as generic as possible to accept other criteria.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]