[jira] [Commented] (HDFS-16785) DataNode hold BP write lock to scan disk

ASF GitHub Bot (Jira) Mon, 10 Oct 2022 05:45:40 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615096#comment-17615096
 ]


ASF GitHub Bot commented on HDFS-16785:
---------------------------------------

Hexiaoqiao commented on PR #4945:
URL: https://github.com/apache/hadoop/pull/4945#issuecomment-1273259888

   I totally agree that we should not hold lock when ever IO operation, 
especially scan the whole disk, it will be one terrible disaster even at 
refresh volume. Of course it does not include during restart DataNode instance.
   Back to this case. I think the point is that we should only hold block pool 
lock (maybe write lock here) when get/set `BlockPoolSlice` rather than one 
coarse grain lock.
   So should we split the following segment and only hold lock for 
`BlockPoolSlice`. And leave other logic without any level locks. I think it 
could be acceptable if meet any conflicts or other exceptions about only one 
volume which is being added.
   ```
         try (AutoCloseDataSetLock l = 
lockManager.writeLock(LockLevel.BLOCK_POOl, bpid)) {
           fsVolume.addBlockPool(bpid, this.conf, this.timer);
           fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
         }
   ```
   WDYT? cc @MingXiangLi @ZanderXu 




> DataNode hold BP write lock to scan disk
> ----------------------------------------
>
>                 Key: HDFS-16785
>                 URL: https://issues.apache.org/jira/browse/HDFS-16785
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>
> When patching the fine-grained locking of datanode, I  found that `addVolume` 
> will hold the write block of the BP lock to scan the new volume to get the 
> blocks. If we try to add one full volume that was fixed offline before, i 
> will hold the write lock for a long time.
> The related code as bellows:
> {code:java}
> for (final NamespaceInfo nsInfo : nsInfos) {
>   String bpid = nsInfo.getBlockPoolID();
>   try (AutoCloseDataSetLock l = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
>     fsVolume.addBlockPool(bpid, this.conf, this.timer);
>     fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
>   } catch (IOException e) {
>     LOG.warn("Caught exception when adding " + fsVolume +
>         ". Will throw later.", e);
>     exceptions.add(e);
>   }
> } {code}
> And I noticed that this lock is added by HDFS-15382, means that this logic is 
> not in lock before. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16785) DataNode hold BP write lock to scan disk

Reply via email to