[ https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904047#comment-16904047 ]
Stephen O'Donnell commented on HDFS-14311: ------------------------------------------ Thanks for the patch [~caiyicong], this is a good discovery. I suspect the reason this has not come up before, is because it likely only happens when the Datanode volumes have a very small number of blocks. The current code path iterates over each storage directory, and if it needs upgraded, it will return a callable which is submitted to an executor, and then the next directory is checked. Inside the callable, it will first upgrade the storage before updating the BlockPoolSliceStorage instance variables. If the storage upgrade happens very quickly, then the first callable will change the instance variables in BlockPoolSliceStorage, and the later storage directories will get the error you mentiond. If the upgrade of the storage takes more time than it takes to create all the callables, which is likely if there many blocks on the storage, then this issue would not manifest. If I understand correctly, your patch works around the problem by creating and collecting all the 'upgrade callables' and then submitting them to the executor only after all of them have been created. That way, it does not matter when the BlockPoolSliceStorage variables are updated. With the current structure of the code, and how the layout version and ctime are used within BlockPoolSliceStorage, I think your patch is the best way of fixing this. Anything else would require a lot more refactoring. I have just a few comments: # I don't believe any of the test failures are related to this change. # Could you address the checkstyle issues highlighted in the last run please? # I wonder if we could think of a way to add a test for this, to at least reproduce the issue. It could be tricky due to the timing of things, but if we create a single DN with quite a few storage directories at an older layout version, and then upgraded them, it may be possible. > multi-threading conflict at layoutVersion when loading block pool storage > ------------------------------------------------------------------------- > > Key: HDFS-14311 > URL: https://issues.apache.org/jira/browse/HDFS-14311 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades > Affects Versions: 2.9.2 > Reporter: Yicong Cai > Assignee: Yicong Cai > Priority: Major > Attachments: HDFS-14311.1.patch > > > When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at > StorageInfo.layoutVersion in loading block pool storage process. > It will cause this exception: > > {panel:title=exceptions} > 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] > - Restored 36974 block files from trash before the layout upgrade. These > blocks will be moved to the previous directory during the upgrade > 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] > - Failed to analyze storage directories for block pool > BP-1216718839-10.120.232.23-1548736842023 > java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the > namespace state: LV = -63 CTime = 0 > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) > at java.lang.Thread.run(Thread.java:748) > 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed > to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block > pool BP-1216718839-10.120.232.23-1548736842023 > java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the > namespace state: LV = -63 CTime = 0 > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816) > at java.lang.Thread.run(Thread.java:748) > {panel} > > root cause: > BlockPoolSliceStorage instance is shared for all storage locations recover > transition. In BlockPoolSliceStorage.doTransition, it will read the old > layoutVersion from local storage, compare with current DataNode version, then > do upgrade. In doUpgrade, add the transition work as a sub-thread, the > transition work will set the BlockPoolSliceStorage's layoutVersion to current > DN version. The next storage dir transition check will concurrent with pre > storage dir real transition work, then the BlockPoolSliceStorage instance > layoutVersion will confusion. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org