[ https://issues.apache.org/jira/browse/HDFS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029786#comment-13029786 ]
Bharath Mundlapudi commented on HDFS-664: ----------------------------------------- Is this Jira similar to this: https://issues.apache.org/jira/browse/HDFS-1362 > Add a way to efficiently replace a disk in a live datanode > ---------------------------------------------------------- > > Key: HDFS-664 > URL: https://issues.apache.org/jira/browse/HDFS-664 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node > Affects Versions: 0.22.0 > Reporter: Steve Loughran > Attachments: HDFS-664.0-20-3-rc2.patch.1, HDFS-664.patch > > > In clusters where the datanode disks are hot swappable, you need to be able > to swap out a disk on a live datanode without taking down the datanode. You > don't want to decommission the whole node as that is overkill. on a system > with 4 1TB HDDs, giving 3 TB of datanode storage, a decommissioning and > restart will consume up to 6 TB of bandwidth. If a single disk were swapped > in then there would only be 1TB of data to recover over the network. More > importantly, if that data could be moved to free space on the same machine, > the recommissioning could take place at disk rates, not network speeds. > # Maybe have a way of decommissioning a single disk on the DN; the files > could be moved to space on the other disks or the other machines in the rack. > # There may not be time to use that option, in which case pulling out the > disk would be done with no warning, a new disk inserted. > # The DN needs to see that a disk has been replaced (or react to some ops > request telling it this), and start using the new disk again -pushing back > data, rebuilding the balance. > To complicate the process, assume there is a live TT on the system, running > jobs against the data. The TT would probably need to be paused while the work > takes place, any ongoing work handled somehow. Halting the TT and then > restarting it after the replacement disk went in is probably simplest. > The more disks you add to a node, the more this scenario becomes a need. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira