[
https://issues.apache.org/jira/browse/HDFS-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuze Yang updated HDFS-17274:
-----------------------------
Description:
When dataNodePeerStats and excludeSlowNodes are enabled, hdfs will distinguish
and exclude slow datanodes when choose target placement. By avoiding use slow
datanodes, we will achive better performance. However, writing files may failed
after excluding slow datanodes, consider following sceneries:
* Cluster A has 4 datanodes, named dn0, dn1, dn2, dn3. From a certain moment,
dn0 is detected as slow disk, and dn1, dn2, dn3 become unavailable due to some
errors. Then write file will fail.
* Cluster A has 4 datanodes, named dn0, dn1, dn2, dn3. dn0 has both ssd and
hdd disks, while dn1, dn2, dn3 only have ssd disks. From a certain moment, dn0
is detected as slow disk. Then write file will fail when using default storage
type "HOT".
In above situation, I think we should let slow datanodes be chosen, it's more
reasonable.
> slow datanodes should be chosen when no more normal datanodes are available
> ---------------------------------------------------------------------------
>
> Key: HDFS-17274
> URL: https://issues.apache.org/jira/browse/HDFS-17274
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Xuze Yang
> Priority: Major
>
> When dataNodePeerStats and excludeSlowNodes are enabled, hdfs will
> distinguish and exclude slow datanodes when choose target placement. By
> avoiding use slow datanodes, we will achive better performance. However,
> writing files may failed after excluding slow datanodes, consider following
> sceneries:
> * Cluster A has 4 datanodes, named dn0, dn1, dn2, dn3. From a certain
> moment, dn0 is detected as slow disk, and dn1, dn2, dn3 become unavailable
> due to some errors. Then write file will fail.
> * Cluster A has 4 datanodes, named dn0, dn1, dn2, dn3. dn0 has both ssd and
> hdd disks, while dn1, dn2, dn3 only have ssd disks. From a certain moment,
> dn0 is detected as slow disk. Then write file will fail when using default
> storage type "HOT".
> In above situation, I think we should let slow datanodes be chosen, it's more
> reasonable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]