[
https://issues.apache.org/jira/browse/HDFS-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610808#comment-17610808
]
ASF GitHub Bot commented on HDFS-16785:
---------------------------------------
MingXiangLi commented on PR #4945:
URL: https://github.com/apache/hadoop/pull/4945#issuecomment-1261780786
> Great point. +1 from my side. So volume level lock will be more proper?
volume level is ok, It will hold the block pool read lock, to avoid case2
conflict occur。
> Not sure if this is block issue. This improvement seems not involve global
meta management, only prepare phase here? So any conflict here?
addVolume() may invoke by refresh Configuration.It may conflict by
FsDatasetImpl#shutdownBlockPool.There are two stages in
FsDatasetImpl#shutdownBlockPool.For example conflict will occur like this.
`
FsDatasetImpl#shutdownBlockPool*volumeMap.cleanUpBlockPool(bpid);
FsDatasetImpl#addVolume*fsVolume.getVolumeMap(bpid, tempVolumeMap,
ramDiskReplicaTracker)
FsDatasetImpl#shutdownBlockPool*volumes.removeBlockPool(bpid,
blocksPerVolume);
`
> DataNode hold BP write lock to scan disk
> ----------------------------------------
>
> Key: HDFS-16785
> URL: https://issues.apache.org/jira/browse/HDFS-16785
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
>
> When patching the fine-grained locking of datanode, I found that `addVolume`
> will hold the write block of the BP lock to scan the new volume to get the
> blocks. If we try to add one full volume that was fixed offline before, i
> will hold the write lock for a long time.
> The related code as bellows:
> {code:java}
> for (final NamespaceInfo nsInfo : nsInfos) {
> String bpid = nsInfo.getBlockPoolID();
> try (AutoCloseDataSetLock l = lockManager.writeLock(LockLevel.BLOCK_POOl,
> bpid)) {
> fsVolume.addBlockPool(bpid, this.conf, this.timer);
> fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
> } catch (IOException e) {
> LOG.warn("Caught exception when adding " + fsVolume +
> ". Will throw later.", e);
> exceptions.add(e);
> }
> } {code}
> And I noticed that this lock is added by HDFS-15382, means that this logic is
> not in lock before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]