[
https://issues.apache.org/jira/browse/HDFS-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079048#comment-14079048
]
Liang Xie commented on HDFS-289:
--------------------------------
We just hit a HBase write performance degradation several days ago, the root
cause turns out is the slow network to/from special datanode due to switch
buffer problem. I am now interesting on implement a simple heuristics excluding
DN feature inside DFSOutputStream. will put more here later:)
> HDFS should blacklist datanodes that are not performing well
> ------------------------------------------------------------
>
> Key: HDFS-289
> URL: https://issues.apache.org/jira/browse/HDFS-289
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: dhruba borthakur
>
> On a large cluster, a few datanodes could be under-performing. There were
> cases when the network connectivity of a few of these bad datanodes were
> degraded, resulting in long long times (in the order of two hours) to
> transfer blocks to and from these datanodes.
> A similar issue arises when disks a single disk on a datanode fail or change
> to read-only mode: in this case the entire datanode shuts down.
> HDFS should detect and handle network and disk performance degradation more
> gracefully. One option would be to blacklist these datanodes, de-prioritise
> their use and alert the administrator.
--
This message was sent by Atlassian JIRA
(v6.2#6252)