[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7722:
--------------------------------
    Attachment: HDFS-7722.001.patch

Hi, [~cmccabe]. Thanks for reviewing. 

I updated the patch based on your inputs. 

Now, {{checkDirs()}} shares the same logic with {{DataNode#refreshVolumes()}}, 
because we'd like to remove everythings about the volumes, i.e., 
{{blockInfos}}, {{FsVolumeImpls}} in {{FsDataset}} and storage dirs in 
{{DataStorage}}. The existing {{checkDirs()}} logic only removes {{blockInfo}} 
and {{FsVolumeImpl}} in {{FsDataset}}. Thus {{checkDirs()}} returns failed 
volumes way back to {{DataNode}}.

Because of the above reason, I chose to let {{checkDirs()}} return 
{{Set<File>}} instead of {{Set<FsVolumeImpl/FsVolumeRef>}}, since these volumes 
will be consumed in {{DataNode}}. I think that {{FsVolumeRef}} should only be 
used when there is I/Os on the volume.

Would you mind take another look?

bq.  Please remember that this scans all files on a volume, which is an 
expensive operation.

{{FsVolumeList#checkDirs}} only checks access permissions on all sub 
directories and does not read files. I agree that it can still be problematic, 
I will file a follow JIRA to throttle it.


> DataNode#checkDiskError should also remove Storage when error is found.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-7722
>                 URL: https://issues.apache.org/jira/browse/HDFS-7722
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch
>
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block 
> metadatas from {{FsDatasetImpl}}. However, it does not removed the 
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
> The result is that, we could not directly run {{reconfig}} to hot swap the 
> failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to