[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911074#comment-16911074
 ] 

Yicong Cai commented on HDFS-14311:
-----------------------------------

Thanks [~sodonnell] [~surendrasingh] [~jojochuang] for your attention and 
review on this issue. 

It is very difficult to use UT to reproduce, I have failed. I first modified 
the check style related issues, I will continue to try to reproduce the problem 
with UT.

> multi-threading conflict at layoutVersion when loading block pool storage
> -------------------------------------------------------------------------
>
>                 Key: HDFS-14311
>                 URL: https://issues.apache.org/jira/browse/HDFS-14311
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.9.2
>            Reporter: Yicong Cai
>            Assignee: Yicong Cai
>            Priority: Major
>         Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch, 
> HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In BlockPoolSliceStorage.doTransition, it will read the old 
> layoutVersion from local storage, compare with current DataNode version, then 
> do upgrade. In doUpgrade, add the transition work as a sub-thread, the 
> transition work will set the BlockPoolSliceStorage's layoutVersion to current 
> DN version. The next storage dir transition check will concurrent with pre 
> storage dir real transition work, then the BlockPoolSliceStorage instance 
> layoutVersion will confusion.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to