star created HDFS-15299:
---------------------------
Summary: Add an option to enable reporting&removing flaky disk
Key: HDFS-15299
URL: https://issues.apache.org/jira/browse/HDFS-15299
Project: Hadoop HDFS
Issue Type: Bug
Reporter: star
Assignee: star
In our production environment with disks more than 8 years old, many DN are
treated as dead because of partially broken. Then NN will balance data blocks
in the cluster, introducing high disk loads. To reduce the impact of flaky
disks, we'd like to extend the tolerance mechanism to partial disk failure.
As described in HDFS-10777 , command du could still throw exception in
a high loaded disk. It is brittle to just remove a flaky disk because it may
recover later. However it is a rare case in our production environment. So can
we just add an option to enable partial disk failure tolerance for users who
has mostly broken disks and care more about stability of the cluster.
We will replace those old disks in the future, but before that, it will
last a long time to run hdfs cluster on those servers.
Comments are appreciated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]