[ 
https://issues.apache.org/jira/browse/HDFS-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626922#comment-17626922
 ] 

ASF GitHub Bot commented on HDFS-16804:
---------------------------------------

ZanderXu commented on code in PR #5033:
URL: https://github.com/apache/hadoop/pull/5033#discussion_r1010020543


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java:
##########
@@ -166,25 +167,24 @@ void addAll(ReplicaMap other) {
   /**
    * Merge all entries from the given replica map into the local replica map.
    */
-  void mergeAll(ReplicaMap other) {
+  void mergeAll(ReplicaMap other) throws IOException {
     Set<String> bplist = other.map.keySet();
     for (String bp : bplist) {
       checkBlockPool(bp);
       try (AutoCloseDataSetLock l = 
lockManager.writeLock(LockLevel.BLOCK_POOl, bp)) {
         LightWeightResizableGSet<Block, ReplicaInfo> replicaInfos = 
other.map.get(bp);
         LightWeightResizableGSet<Block, ReplicaInfo> curSet = map.get(bp);
+        if (curSet == null) {
+          // Can't find the block pool id in the replicaMap. Maybe it has been 
removed.

Review Comment:
   @DaveTeng0 Thanks for your review.
   
   > is it possible we can't find block pool id from the map, but the block 
pool is not removed yet?
   
   I think it's impossible. If you find some cases, please share.
   
   1. The `mergeAll` method only been used by the `activateVolume` method with 
a `synchronized` lock.
   2. The `shutdownBlockPool` method only been used by the `shutdownBlockPool` 
method with a `synchronized` lock, which beed added in this PR.
   3. The `addBlockPool` method always initialize an empty 
`LightWeightResizableGSet` in the global `ReplicaMap` for this block pool.





> AddVolume contains a race condition with shutdown block pool
> ------------------------------------------------------------
>
>                 Key: HDFS-16804
>                 URL: https://issues.apache.org/jira/browse/HDFS-16804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>
> Add Volume contains a race condition with shutdown block pool, causing the 
> ReplicaMap still contains some blocks belong to the removed block pool.
> And the new volume still contains one unused BlockPoolSlice belongs to the 
> removed block pool, caused some problems, such as: incorrect dfsUsed, 
> incorrect numBlocks of the volume.
> Let's review the logic of addVolume and shutdownBlockPool respectively.
>  
> AddVolume Logic:
>  * Step1: Get all namespaceInfo from blockPoolManager
>  * Step2: Create one temporary FsVolumeImpl object
>  * Step3: Create some blockPoolSlice according to the namespaceInfo and add 
> them to the temporary FsVolumeImpl object
>  * Step4: Scan all blocks of the namespaceInfo from the volume and store them 
> by one temporary ReplicaMap
>  * Step5: Active the temporary FsVolumeImpl which created before (with 
> FsDatasetImpl synchronized lock)
>  ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global 
> ReplicaMap
>  ** Step5.2: Add the FsVolumeImpl to the volumes
> ShutdownBlockPool Logic:(with blockPool write lock)
>  * Step1: Cleanup the blockPool from the global ReplicaMap
>  * Step2: Shutdown the block pool from all the volumes
>  ** Step2.1: do some clean operations for the block pool, such as 
> saveReplica, saveDfsUsed, etc
>  ** Step2.2: remove the blockPool from bpSlices
> The race condition can be reproduced by the following steps:
>  * AddVolume Step1: Get all namespaceInfo from blockPoolManager
>  * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
>  * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
>  * AddVolume Step 2~5
> And result:
>  * The global replicaMap contains some blocks belong to the removed blockPool
>  * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the 
> removed blockPool
> Expected result:
>  * The global replicaMap shouldn't contain any blocks belong to the removed 
> blockPool
>  * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice 
> belong to the removed blockPool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to