[
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei (Eddy) Xu updated HDFS-7722:
--------------------------------
Attachment: HDFS-7722.001.patch
Hi, [~cmccabe]. Thanks for reviewing.
I updated the patch based on your inputs.
Now, {{checkDirs()}} shares the same logic with {{DataNode#refreshVolumes()}},
because we'd like to remove everythings about the volumes, i.e.,
{{blockInfos}}, {{FsVolumeImpls}} in {{FsDataset}} and storage dirs in
{{DataStorage}}. The existing {{checkDirs()}} logic only removes {{blockInfo}}
and {{FsVolumeImpl}} in {{FsDataset}}. Thus {{checkDirs()}} returns failed
volumes way back to {{DataNode}}.
Because of the above reason, I chose to let {{checkDirs()}} return
{{Set<File>}} instead of {{Set<FsVolumeImpl/FsVolumeRef>}}, since these volumes
will be consumed in {{DataNode}}. I think that {{FsVolumeRef}} should only be
used when there is I/Os on the volume.
Would you mind take another look?
bq. Please remember that this scans all files on a volume, which is an
expensive operation.
{{FsVolumeList#checkDirs}} only checks access permissions on all sub
directories and does not read files. I agree that it can still be problematic,
I will file a follow JIRA to throttle it.
> DataNode#checkDiskError should also remove Storage when error is found.
> -----------------------------------------------------------------------
>
> Key: HDFS-7722
> URL: https://issues.apache.org/jira/browse/HDFS-7722
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch
>
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block
> metadatas from {{FsDatasetImpl}}. However, it does not removed the
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}.
> The result is that, we could not directly run {{reconfig}} to hot swap the
> failure disks without changing the configure file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)