[
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei (Eddy) Xu updated HDFS-7722:
--------------------------------
Attachment: HDFS-7722.002.patch
Updated the patch to address {{TestDataNodeVolumeFailureReporting}} failures.
Hi, [~cnauroth], I found that in
{{TestDataNodeVolumeFailureReporting#testDataNodeReconfigureWithVolumeFailures}},
you assumed that removing a volume can clear the failed volume info. However,
this patch assumes that a volume will be removed completely when {{checkDirs}}
finding an error, while the {{VolumeFailureInfo}} is kept for reporting
purpose.
* The pros are that: user can directly run {{-reconfig}} to load a new disk
without changing {{dfs.data.dirs}}.
* The cons are that: as shown in your test, we can not use {-reconfig} to clear
the {{VolumeFailureInfo}}, since we can not find this volume from
{{DataNode#parseChangedVolumes()}}.
Does it make sense to you? Would you mind to share your options?
Thanks!
> DataNode#checkDiskError should also remove Storage when error is found.
> -----------------------------------------------------------------------
>
> Key: HDFS-7722
> URL: https://issues.apache.org/jira/browse/HDFS-7722
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch,
> HDFS-7722.002.patch
>
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block
> metadatas from {{FsDatasetImpl}}. However, it does not removed the
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}.
> The result is that, we could not directly run {{reconfig}} to hot swap the
> failure disks without changing the configure file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)