[
https://issues.apache.org/jira/browse/HDFS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931209#action_12931209
]
Todd Lipcon commented on HDFS-1362:
-----------------------------------
We had a brief meeting this morning to discuss this JIRA. To summarize for the
community:
- Having the ability to add/remove volumes via RPC has the issue that the
changes are not reflected in the config file, so we risk that an admin may add
a volume but forgot to modify the config. The next time the cluster is
restarted, the volume will be missing and cause problems.
- We discussed that the primary use case for this feature is restoring a volume
after it has failed. The other use case (adding a new volume to a DN that has
not suffered any issues) is rather rare.
- So, rather than providing add/list/remove APIs, we decided to simply add a
"refresh" API. There were two options suggested here:
1. Make use of the new HADOOP-7001 interface for reconfiguring daemons. In this
case an admin could modify the config file to add new volumes, and then refresh
the config to have the DN pick up new volumes, or re-add failed volumes. The
potential issue here is that, even if the configuration has not changed, we
still want the "refresh" to do something, so maybe this is not the right place.
2. Add a new RPC and command line tool, something like "dfsadmin
-restoreDNStorage <datanode IP:port>". This would not re-read the conf file,
but rather just re-check any failed volumes to see if they are newly available.
This could alternatively be triggered by a new DN servlet or something if it's
simpler.
- We also discussed pluggability (HDFS-1405). Tom and I were of the opinion
that this feature is generally useful and don't see any compelling reason to
make it a plugin. We should just improve FSDataset directly instead of
extending it into a new java class.
- Regarding the new feature of copying blocks from volume to volume in the case
that one volume has gone read-only, we decided that we should defer this to a
separate JIRA to be implemented after this is complete. That will make this one
smaller and easier to review.
> Provide volume management functionality for DataNode
> ----------------------------------------------------
>
> Key: HDFS-1362
> URL: https://issues.apache.org/jira/browse/HDFS-1362
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node
> Reporter: Wang Xu
> Assignee: Wang Xu
> Attachments: HDFS-1362.txt, Provide_volume_management_for_DN_v1.pdf
>
>
> The current management unit in Hadoop is a node, i.e. if a node failed, it
> will be kicked out and all the data on the node will be replicated.
> As almost all SATA controller support hotplug, we add a new command line
> interface to datanode, thus it can list, add or remove a volume online, which
> means we can change a disk without node decommission. Moreover, if the failed
> disk still readable and the node has enouth space, it can migrate data on the
> disks to other disks in the same node.
> A more detailed design document will be attached.
> The original version in our lab is implemented against 0.20 datanode
> directly, and is it better to implemented it in contrib? Or any other
> suggestion?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.