ZanderXu created HDFS-16804:
-------------------------------
Summary: AddVolume contains a race condition with shutdown block
pool
Key: HDFS-16804
URL: https://issues.apache.org/jira/browse/HDFS-16804
Project: Hadoop HDFS
Issue Type: Bug
Reporter: ZanderXu
Assignee: ZanderXu
Add Volume contains a race condition with shutdown block pool, causing the
ReplicaMap still contains some blocks belong to the removed block pool.
And the new volume still contains one unused BlockPoolSlice belongs to the
removed block pool, caused some problems, such as: incorrect dfsUsed, incorrect
numBlocks of the volume.
Let's review the logic of addVolume and shutdownBlockPool respectively.
AddVolume Logic:
* Step1: Get all namespaceInfo from blockPoolManager
* Step2: Create one temporary FsVolumeImpl object
* Step3: Create some blockPoolSlice according to the namespaceInfo and add
them to the temporary FsVolumeImpl object
* Step4: Scan all blocks of the namespaceInfo from the volume and store them
by one temporary ReplicaMap
* Step5: Active the temporary FsVolumeImpl which created before (with
FsDatasetImpl synchronized lock)
** Step5.1: Merge all blocks of the temporary ReplicaMap to the global
ReplicaMap
** Step5.2: Add the FsVolumeImpl to the volumes
ShutdownBlockPool Logic:(with blockPool write lock)
* Step1: Cleanup the blockPool from the global ReplicaMap
* Step2: Shutdown the block pool from all the volumes
** Step2.1: do some clean operations for the block pool, such as saveReplica,
saveDfsUsed, etc
** Step2.2: remove the blockPool from bpSlices
The race condition can be reproduced by the following steps:
* AddVolume Step1: Get all namespaceInfo from blockPoolManager
* ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
* ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
* AddVolume Step 2~5
And result:
* The global replicaMap contains some blocks belong to the removed blockPool
* The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the
removed blockPool
Expected result:
* The global replicaMap shouldn't contain any blocks belong to the removed
blockPool
* The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice belong
to the removed blockPool
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]