[jira] [Commented] (HDFS-9908) Datanode should tolerate disk failure during NN handshake

Wei-Chiu Chuang (JIRA) Fri, 04 Mar 2016 13:04:18 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180534#comment-15180534
 ]


Wei-Chiu Chuang commented on HDFS-9908:
---------------------------------------

For completeness, here's the related logs in DN:

*DN connects to NN:*
2016-02-18 02:20:37,949 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened IPC server at /10.107.162.126:50020
2016-02-18 02:20:38,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Refresh request received for nameservices: nameservice1
2016-02-18 02:20:38,067 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Starting BPOfferServices for nameservices: nameservice1
2016-02-18 02:20:38,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool <registering> (Datanode Uuid unassigned) service to 
namenode1.weichiu.com/10.107.162.110:8022 starting to offer service
2016-02-18 02:20:38,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool <registering> (Datanode Uuid unassigned) service to 
namenode2.weichiu.com10.107.162.120:8022 starting to offer service
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 50020: starting
2016-02-18 02:20:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: namenode1.weichiu.com/10.107.162.110:8022. Already tried 0 time(s); 
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1000 MILLISECONDS)

*Then DN does handshake, gets bpid from NN, and then analyze storage:*

2016-02-18 02:20:53,512 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/1/dfs/dn/in_use.lock acquired by nodename [email protected]
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Analyzing storage directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Locking is disabled
2016-02-18 02:20:53,606 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Restored 0 block files from trash.
....
all of the disks are successful

*But one of them failed to scan:*

2016-02-18 02:23:36,224 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught 
exception while scanning /data/8/dfs/dn/current.
Will throw later.
ExitCodeException exitCode=1: du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_
1088686909': Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_1088686909_14954023.meta':
 Inp
ut/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560':
 Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560_19818947.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577':
 Input/output erro
r
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577_19830979.meta':
 Inp
ut/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125':
 Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897':
 Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897_19819284.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003_19819390.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003':
 Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125_19819512.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747':
 Input/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747_19819134.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249_19818632.meta':
 Inpu
t/output error
du: cannot access 
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249':
 Input/output error

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.fs.DU.run(DU.java:190)
        at org.apache.hadoop.fs.DU.<init>(DU.java:70)
        at org.apache.hadoop.fs.DU.<init>(DU.java:95)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:116)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:284)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:247)

*At this point, DN will attempt to handshake with NN again, only to find the 
disk is being used due to lock file:*

2016-02-18 02:23:36,238 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Storage directory [DISK]file:/data/1/dfs/dn/ has already been used.
2016-02-18 02:23:36,273 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Analyzing storage directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:23:36,273 WARN org.apache.hadoop.hdfs.server.common.Storage: 
Failed to analyze storage directories for block pool 
BP-1018136951-49.4.167.110-1403564146510
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to 
load an used block storage: 
/data/1/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:391)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:472)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
        at java.lang.Thread.run(Thread.java:745)

*Finally, it will think all disks are being used, and can't successfully 
perform the second handshake:*
2016-02-18 02:23:36,558 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool <registering> (Datanode Uuid unassigned) 
service to namenode1.weichiu.com/10.107.162.120:8022. Exiting.
java.io.IOException: All specified directories are failed to load.
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:473)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
        at java.lang.Thread.run(Thread.java:745)
2016-02-18 02:23:36,559 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Block pool ID needed, but service not yet registered with NN

> Datanode should tolerate disk failure during NN handshake
> ---------------------------------------------------------
>
>                 Key: HDFS-9908
>                 URL: https://issues.apache.org/jira/browse/HDFS-9908
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.5.0
>         Environment: CDH5.3.3
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>
> DN may treat a disk failure exception as NN handshake exception, and this can 
> prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>    * One of the Block Pools has successfully connected to its NN.
>    * This initializes the local storage for that block pool,
>    * checks consistency of the NN's cluster ID, etc.
>    * 
>    * If this is the first block pool to register, this also initializes
>    * the datanode-scoped storage.
>    * 
>    * @param bpos Block pool offer service
>    * @throws IOException if the NN is inconsistent with the local storage.
>    */
>   void initBlockPool(BPOfferService bpos) throws IOException {
>     NamespaceInfo nsInfo = bpos.getNamespaceInfo();
>     if (nsInfo == null) {
>       throw new IOException("NamespaceInfo not found: Block pool " + bpos
>           + " should have retrieved namespace info before initBlockPool.");
>     }
>     
>     setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
>     // Register the new block pool with the BP manager.
>     blockPoolManager.addBlockPool(bpos);
>     
>     // In the case that this is the first block pool to connect, initialize
>     // the dataset, block scanners, etc.
>     initStorage(nsInfo);
>     // Exclude failed disks before initializing the block pools to avoid 
> startup
>     // failures.
>     checkDiskError();
>     data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <----- this line 
> throws disk error exception
>     blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
>     initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>     long totalStartTime = Time.monotonicNow();
>     
>     final List<IOException> exceptions = Collections.synchronizedList(
>         new ArrayList<IOException>());
>     List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
>     for (final FsVolumeImpl v : volumes) {
>       Thread t = new Thread() {
>         public void run() {
>           try (FsVolumeReference ref = v.obtainReference()) {
>             FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>                 " on volume " + v + "...");
>             long startTime = Time.monotonicNow();
>             v.addBlockPool(bpid, conf);
>             long timeTaken = Time.monotonicNow() - startTime;
>             FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>                 " on " + v + ": " + timeTaken + "ms");
>           } catch (ClosedChannelException e) {
>             // ignore.
>           } catch (IOException ioe) {
>             FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>                 ". Will throw later.", ioe);
>             exceptions.add(ioe);
>           }
>         }
>       };
>       blockPoolAddingThreads.add(t);
>       t.start();
>     }
>     for (Thread t : blockPoolAddingThreads) {
>       try {
>         t.join();
>       } catch (InterruptedException ie) {
>         throw new IOException(ie);
>       }
>     }
>     if (!exceptions.isEmpty()) {
>       throw exceptions.get(0); <----- here's the original of exception
>     }
>     
>     long totalTimeTaken = Time.monotonicNow() - totalStartTime;
>     FsDatasetImpl.LOG.info("Total time to scan all replicas for block pool " +
>         bpid + ": " + totalTimeTaken + "ms");
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk failure during NN handshake

Reply via email to