Xiangyi Zhu created HDFS-17191:
----------------------------------
Summary: HDFS: Delete operation adds a thread to collect blocks
asynchronously
Key: HDFS-17191
URL: https://issues.apache.org/jira/browse/HDFS-17191
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs
Affects Versions: 3.4.0
Reporter: Xiangyi Zhu
Assignee: Xiangyi Zhu
When we delete a large directory, it is time-consuming to collect the blocks in
the deleted subtree. Currently, block collection is executed within a write
lock. If a large directory is deleted, other RPCs may be blocked for a period
of time. Asynchronous deletion of collected blocks has been implemented, we can
refer to this.
In fact, collecting blocks does not require locking, because after the subtree
is deleted, this subtree will not be accessed by other RPCs. We can collect the
deleted subtree asynchronously and without locking.
But there may be some problems:
1. When the parent node of the subtree is configured with quota, the quota
update is not synchronous and there will be a small delay.
2. Because the root directory always has the DirectoryWithQuotaFeature
attribute, we need to update the quotaUsage of the root directory anyway. In
addition, the root directory does not have an upper limit for quota
configuration. I think we can ignore the delayed update of quota for the root
directory.
To solve the above problem, we can check whether all parent directories of the
subtree are configured with quota. If quota is not configured, use asynchronous
collection. We can also use configuration to let users decide whether to enable
quota checking.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]