[
https://issues.apache.org/jira/browse/HDFS-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786183#comment-16786183
]
Stephen O'Donnell commented on HDFS-14333:
------------------------------------------
I have added a new patch that does the following:
1. Added the same exception catch / throw logic around
FsVolumeList.addBlockPool - even though this should never happen it makes sense
to include it I think.
2. I move the throw statement to the bottom of the methods in
FsVolumeList.addBlockPool() and FsVolumeList.getAllVolumesMap, as otherwise the
timing log message would not get written due to one disk failure.
3. I have added a test that simulates the failure in the same way I was able to
manually - by setting one of the directories inside a volume to be unreadable.
I'm not overly happy with this test, as it needs to pause to wait for a
datanode restart, and to ensure the DN stays up, but it does reproduce the
issue without the fix in place and passes with it in. There are other tests in
that class that do similar things too.
I looked into the SimulatedFsDataset to see if I could mock the failure more
cleanly, but unfortunately addBlockPool() in SimulatedFsDataset does not have a
"throws IOException" clause like FsDatasetImpl does, so I could not override it
to do what I wanted, and I didn't want to change it in case that had any knock
on effects on other things which use it. FsDatasetImpl is also a private class,
so I was not able to sub-class it and override the method either. I did not
think it was worth creating a new completely new implementation of the
interface for just this test.
Let me know what you think please, especially around the testing part.
> Datanode fails to start if any disk has errors during Namenode registration
> ---------------------------------------------------------------------------
>
> Key: HDFS-14333
> URL: https://issues.apache.org/jira/browse/HDFS-14333
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14333.001.patch, HDFS-14333.002.patch
>
>
> This is closely related to HDFS-9908, where it was reported that a datanode
> would fail to start if an IO error occurred on a single disk when running du
> during Datanode registration. That Jira was closed due to HADOOP-12973 which
> refactored how du is called and prevents any exception being thrown. However
> this problem can still occur if the volume has errors (eg permission or
> filesystem corruption) when the disk is scanned to load all the replicas. The
> method chain is:
> DataNode.initBlockPool -> FSDataSetImpl.addBlockPool ->
> FSVolumeList.getAllVolumesMap -> Throws exception which goes unhandled.
> The DN logs will contain a stack trace for the problem volume, so the
> workaround is to remove the volume from the DN config and the DN will start,
> but the logs are a little confusing, so its always not obvious what the issue
> is.
> These are the cut down logs from an occurrence of this issue.
> {code}
> 2019-03-01 08:58:24,830 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning
> block pool BP-240961797-x.x.x.x-1392827522027 on volume
> /data/18/dfs/dn/current...
> ...
> 2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could
> not get disk usage information
> ExitCodeException exitCode=1: du: cannot read directory
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215':
> Permission denied
> du: cannot read directory
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir213':
> Permission denied
> du: cannot read directory
> `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir97/subdir25':
> Permission denied
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
> at org.apache.hadoop.util.Shell.run(Shell.java:504)
> at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61)
> at org.apache.hadoop.fs.DU.refresh(DU.java:53)
> at
> org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84)
> at
> org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:145)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412)
> ...
> 2019-03-01 08:58:27,043 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time
> taken to scan block pool BP-240961797-x.x.x.x-1392827522027 on
> /data/18/dfs/dn/current: 2202ms
> {code}
> So we can see a du error occurred, was logged but not re-thrown (due to
> HADOOP-12973) and the blockpool scan completed. However then in the 'add
> replicas to map' logic, we got another exception stemming from the same
> problem:
> {code}
> 2019-03-01 08:58:27,564 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding
> replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume
> /data/18/dfs/dn/current...
> ...
> 2019-03-01 08:58:31,155 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught
> exception while adding replicas from /data/18/dfs/dn/current. Will throw
> later.
> java.io.IOException: Invalid directory or I/O error occurred for dir:
> /data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:861)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> < The message 2019-03-01 08:59:00,989 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on
> volume xxx did not appear for this volume as it failed >
> {code}
> The exception is re-thrown, so the DN fails registration and then retries.
> Then it finds all volumes already locked and exits with a 'all volumes
> failed' error.
> I believe we should handle the failing volume like a runtime volume failure
> and only abort the DN if too many volumes have failed.
> I will post a patch for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]