[
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yicong Cai updated HDFS-14311:
------------------------------
Attachment: (was: HDFS-14311.1.patch)
> multi-threading conflict at layoutVersion when loading block pool storage
> -------------------------------------------------------------------------
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: rolling upgrades
> Affects Versions: 2.9.2
> Reporter: Yicong Cai
> Priority: Major
> Fix For: 3.3.0, 2.9.3
>
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395]
> - Restored 36974 block files from trash before the layout upgrade. These
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226]
> - Failed to analyze storage directories for block pool
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the
> namespace state: LV = -63 CTime = 0
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
> at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the
> namespace state: LV = -63 CTime = 0
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
> at java.lang.Thread.run(Thread.java:748)
> {panel}
>
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover
> transition. In BlockPoolSliceStorage.doTransition, it will read the old
> layoutVersion from local storage, compare with current DataNode version, then
> do upgrade. In doUpgrade, add the transition work as a sub-thread, the
> transition work will set the BlockPoolSliceStorage's layoutVersion to current
> DN version. The next storage dir transition check will concurrent with pre
> storage dir real transition work, then the BlockPoolSliceStorage instance
> layoutVersion will confusion.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]