[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

Jean-Daniel Cryans (JIRA) Wed, 05 Dec 2012 15:53:59 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510939#comment-13510939
 ]


Jean-Daniel Cryans commented on HDFS-4239:
------------------------------------------

bq. That should work just fine, if the HDFS config is compatible with the new 
set of available directories. 

We tried it today, it worked fine. We did encounter an interesting problem tho, 
the region server on the same node continued to use that disk directly since 
it's configured with local reads.

To rephrase that, a long running BlockReaderLocal will ride over local DN 
restarts and disk "ejections". We had to drain the RS of all its regions in 
order to stop it from using the bad disk.
                
> Means of telling the datanode to stop using a sick disk
> -------------------------------------------------------
>
>                 Key: HDFS-4239
>                 URL: https://issues.apache.org/jira/browse/HDFS-4239
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: stack
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

Reply via email to