[
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506791#comment-13506791
]
Andy Isaacson commented on HDFS-4239:
-------------------------------------
To expand on my previous comment:
I tested on trunk, on a DN with {{dfs.datanode.failed.volumes.tolerated=1}}.
Running {{chmod 0 /data/5/datadir/current}} caused the DN to eject the volume
and continue operating. I then used {{lsof -p}} to verify what filedescriptors
remained open and observed that {{/data/5/datadir/in_use.lock}} was still open.
> Means of telling the datanode to stop using a sick disk
> -------------------------------------------------------
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: stack
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode. If the datanode is carrying 6 or 12
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes --
> the rereplication of the downed datanode's data can be pretty disruptive,
> especially if the cluster is doing low latency serving: e.g. hosting an hbase
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You
> can't unmount the disk while it is in use). This latter is better in that
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop
> using a disk an operator has designated 'bad'. This would be like option #2
> above minus the need to stop and restart the datanode. Ideally the disk
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk
> after its been replaced.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira