[
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180534#comment-15180534
]
Wei-Chiu Chuang commented on HDFS-9908:
---------------------------------------
For completeness, here's the related logs in DN:
*DN connects to NN:*
2016-02-18 02:20:37,949 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Opened IPC server at /10.107.162.126:50020
2016-02-18 02:20:38,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Refresh request received for nameservices: nameservice1
2016-02-18 02:20:38,067 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Starting BPOfferServices for nameservices: nameservice1
2016-02-18 02:20:38,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Block pool <registering> (Datanode Uuid unassigned) service to
namenode1.weichiu.com/10.107.162.110:8022 starting to offer service
2016-02-18 02:20:38,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Block pool <registering> (Datanode Uuid unassigned) service to
namenode2.weichiu.com10.107.162.120:8022 starting to offer service
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server listener
on 50020: starting
2016-02-18 02:20:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: namenode1.weichiu.com/10.107.162.110:8022. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
*Then DN does handshake, gets bpid from NN, and then analyze storage:*
2016-02-18 02:20:53,512 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock
on /data/1/dfs/dn/in_use.lock acquired by nodename [email protected]
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage:
Locking is disabled
2016-02-18 02:20:53,606 INFO org.apache.hadoop.hdfs.server.common.Storage:
Restored 0 block files from trash.
....
all of the disks are successful
*But one of them failed to scan:*
2016-02-18 02:23:36,224 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught
exception while scanning /data/8/dfs/dn/current.
Will throw later.
ExitCodeException exitCode=1: du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_
1088686909': Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_1088686909_14954023.meta':
Inp
ut/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560':
Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560_19818947.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577':
Input/output erro
r
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577_19830979.meta':
Inp
ut/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125':
Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897':
Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897_19819284.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003_19819390.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003':
Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125_19819512.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747':
Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747_19819134.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249_19818632.meta':
Inpu
t/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249':
Input/output error
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.fs.DU.run(DU.java:190)
at org.apache.hadoop.fs.DU.<init>(DU.java:70)
at org.apache.hadoop.fs.DU.<init>(DU.java:95)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:116)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:284)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:247)
*At this point, DN will attempt to handshake with NN again, only to find the
disk is being used due to lock file:*
2016-02-18 02:23:36,238 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory [DISK]file:/data/1/dfs/dn/ has already been used.
2016-02-18 02:23:36,273 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:23:36,273 WARN org.apache.hadoop.hdfs.server.common.Storage:
Failed to analyze storage directories for block pool
BP-1018136951-49.4.167.110-1403564146510
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to
load an used block storage:
/data/1/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210)
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:391)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:472)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
at java.lang.Thread.run(Thread.java:745)
*Finally, it will think all disks are being used, and can't successfully
perform the second handshake:*
2016-02-18 02:23:36,558 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode:
Initialization failed for Block pool <registering> (Datanode Uuid unassigned)
service to namenode1.weichiu.com/10.107.162.120:8022. Exiting.
java.io.IOException: All specified directories are failed to load.
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:473)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
at java.lang.Thread.run(Thread.java:745)
2016-02-18 02:23:36,559 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Block pool ID needed, but service not yet registered with NN
> Datanode should tolerate disk failure during NN handshake
> ---------------------------------------------------------
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.5.0
> Environment: CDH5.3.3
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
>
> DN may treat a disk failure exception as NN handshake exception, and this can
> prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files
> per disk, and then scan the volumes. However, if the scanning throws
> exceptions due to disk failure, DN will think it's an exception because NN is
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it
> reconnects to NN again, it will think the same disks are already being used,
> and then it will fail handshake again because all disks can not be used (due
> to locking), and repeatedly. This will happen even if the DN has multiple
> disks, and only one of them fails. The DN will not be able to connect to NN
> despite just one failing disk. Note that it is possible to successfully
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0,
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that
> DN treats an internal error (single disk failure) as an external one (NN
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
> * One of the Block Pools has successfully connected to its NN.
> * This initializes the local storage for that block pool,
> * checks consistency of the NN's cluster ID, etc.
> *
> * If this is the first block pool to register, this also initializes
> * the datanode-scoped storage.
> *
> * @param bpos Block pool offer service
> * @throws IOException if the NN is inconsistent with the local storage.
> */
> void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
> throw new IOException("NamespaceInfo not found: Block pool " + bpos
> + " should have retrieved namespace info before initBlockPool.");
> }
>
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
>
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf); <----- this line
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
> }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
> void addBlockPool(final String bpid, final Configuration conf) throws
> IOException {
> long totalStartTime = Time.monotonicNow();
>
> final List<IOException> exceptions = Collections.synchronizedList(
> new ArrayList<IOException>());
> List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
> for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
> public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
> FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
> " on volume " + v + "...");
> long startTime = Time.monotonicNow();
> v.addBlockPool(bpid, conf);
> long timeTaken = Time.monotonicNow() - startTime;
> FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
> " on " + v + ": " + timeTaken + "ms");
> } catch (ClosedChannelException e) {
> // ignore.
> } catch (IOException ioe) {
> FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
> ". Will throw later.", ioe);
> exceptions.add(ioe);
> }
> }
> };
> blockPoolAddingThreads.add(t);
> t.start();
> }
> for (Thread t : blockPoolAddingThreads) {
> try {
> t.join();
> } catch (InterruptedException ie) {
> throw new IOException(ie);
> }
> }
> if (!exceptions.isEmpty()) {
> throw exceptions.get(0); <----- here's the original of exception
> }
>
> long totalTimeTaken = Time.monotonicNow() - totalStartTime;
> FsDatasetImpl.LOG.info("Total time to scan all replicas for block pool " +
> bpid + ": " + totalTimeTaken + "ms");
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)