[ 
https://issues.apache.org/jira/browse/HDFS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007053#comment-13007053
 ] 

Sanjay Radia commented on HDFS-1362:
------------------------------------

I would like to better understand the use case for this:
So the main use case of this is that one wants to hot plug a new drive without 
restarting the DN daemon. Correct?

I just talked to my operations team and they told me that they will not hot 
replace individual drives - its too risky as an operator may replace the wrong 
drive and further they have doubts about how well the OS will deal with this.
Further they point out that one has format and mount the volume and hence login 
on the machine.
The mode our ops are planning for our 12 disk nodes is to wait till about 3 
disks have failed then then decommission the DN and the replace the drives. 

Allen your thoughts on the use case for this feature?

> Provide volume management functionality for DataNode
> ----------------------------------------------------
>
>                 Key: HDFS-1362
>                 URL: https://issues.apache.org/jira/browse/HDFS-1362
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.23.0
>
>         Attachments: DataNode Volume Refreshment in HDFS-1362.pdf, 
> HDFS-1362.4_w7001.txt, HDFS-1362.5.patch, HDFS-1362.6.patch, 
> HDFS-1362.7.patch, HDFS-1362.txt, Provide_volume_management_for_DN_v1.pdf
>
>
> The current management unit in Hadoop is a node, i.e. if a node failed, it 
> will be kicked out and all the data on the node will be replicated.
> As almost all SATA controller support hotplug, we add a new command line 
> interface to datanode, thus it can list, add or remove a volume online, which 
> means we can change a disk without node decommission. Moreover, if the failed 
> disk still readable and the node has enouth space, it can migrate data on the 
> disks to other disks in the same node.
> A more detailed design document will be attached.
> The original version in our lab is implemented against 0.20 datanode 
> directly, and is it better to implemented it in contrib? Or any other 
> suggestion?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to