[
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911692#comment-16911692
]
Hudson commented on HDFS-14311:
-------------------------------
FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17152 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/17152/])
HDFS-14311. Multi-threading conflict at layoutVersion when loading block
(weichiu: rev 4cb22cd867a9295efc815dc95525b5c3e5960ea6)
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
> Multi-threading conflict at layoutVersion when loading block pool storage
> -------------------------------------------------------------------------
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: rolling upgrades
> Affects Versions: 2.9.2
> Reporter: Yicong Cai
> Assignee: Yicong Cai
> Priority: Major
> Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch,
> HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395]
> - Restored 36974 block files from trash before the layout upgrade. These
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226]
> - Failed to analyze storage directories for block pool
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the
> namespace state: LV = -63 CTime = 0
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
> at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the
> namespace state: LV = -63 CTime = 0
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
> at java.lang.Thread.run(Thread.java:748)
> {panel}
>
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover
> transition. In BlockPoolSliceStorage.doTransition, it will read the old
> layoutVersion from local storage, compare with current DataNode version, then
> do upgrade. In doUpgrade, add the transition work as a sub-thread, the
> transition work will set the BlockPoolSliceStorage's layoutVersion to current
> DN version. The next storage dir transition check will concurrent with pre
> storage dir real transition work, then the BlockPoolSliceStorage instance
> layoutVersion will confusion.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]