[
https://issues.apache.org/jira/browse/HDFS-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034532#comment-18034532
]
ASF GitHub Bot commented on HDFS-16804:
---------------------------------------
github-actions[bot] commented on PR #5033:
URL: https://github.com/apache/hadoop/pull/5033#issuecomment-3475292111
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> AddVolume contains a race condition with shutdown block pool
> ------------------------------------------------------------
>
> Key: HDFS-16804
> URL: https://issues.apache.org/jira/browse/HDFS-16804
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
>
> Add Volume contains a race condition with shutdown block pool, causing the
> ReplicaMap still contains some blocks belong to the removed block pool.
> And the new volume still contains one unused BlockPoolSlice belongs to the
> removed block pool, caused some problems, such as: incorrect dfsUsed,
> incorrect numBlocks of the volume.
> Let's review the logic of addVolume and shutdownBlockPool respectively.
>
> AddVolume Logic:
> * Step1: Get all namespaceInfo from blockPoolManager
> * Step2: Create one temporary FsVolumeImpl object
> * Step3: Create some blockPoolSlice according to the namespaceInfo and add
> them to the temporary FsVolumeImpl object
> * Step4: Scan all blocks of the namespaceInfo from the volume and store them
> by one temporary ReplicaMap
> * Step5: Active the temporary FsVolumeImpl which created before (with
> FsDatasetImpl synchronized lock)
> ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global
> ReplicaMap
> ** Step5.2: Add the FsVolumeImpl to the volumes
> ShutdownBlockPool Logic:(with blockPool write lock)
> * Step1: Cleanup the blockPool from the global ReplicaMap
> * Step2: Shutdown the block pool from all the volumes
> ** Step2.1: do some clean operations for the block pool, such as
> saveReplica, saveDfsUsed, etc
> ** Step2.2: remove the blockPool from bpSlices
> The race condition can be reproduced by the following steps:
> * AddVolume Step1: Get all namespaceInfo from blockPoolManager
> * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
> * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
> * AddVolume Step 2~5
> And result:
> * The global replicaMap contains some blocks belong to the removed blockPool
> * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the
> removed blockPool
> Expected result:
> * The global replicaMap shouldn't contain any blocks belong to the removed
> blockPool
> * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice
> belong to the removed blockPool
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]