zhu created HDFS-16000:
--------------------------
Summary: HDFS : Rename performance optimization
Key: HDFS-16000
URL: https://issues.apache.org/jira/browse/HDFS-16000
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs, namenode
Affects Versions: 3.1.4, 3.3.1
Environment: It takes a long time to move a large directory with
rename. For example, it takes about 40 seconds to move a 1000W directory. When
a large amount of data is deleted to the trash, the move large directory will
occur when the recycle bin makes checkpoint. In addition, the user may also
actively trigger the move large directory operation, which will cause the
NameNode to lock too long and be killed by Zkfc. Through the flame graph, it is
found that the main time consuming is to create the EnumCounters object.
*I think the following two points can optimize the efficiency of rename
execution*
*
h3. *QuotaCount calculation time-consuming optimization:*
## Create a QuotaCounts object in the calculation directory quotaCount, and
pass the quotaCount to the next calculation function through a parameter each
time, so as to avoid creating an EnumCounters object for each calculation.
## In addition, through the flame graph, it is found that using lambda to
modify QuotaCounts takes longer than the ordinary method, so the ordinary
method is used to modify the QuotaCounts count.
*
h3. Rename logic optimization:
## Regardless of whether the rename operation is the source directory and the
target directory, the quota count must be calculated three times. The first
time, check whether the moved directory exceeds the target directory quota, the
second time, calculate the mobile directory quota to update the source
directory quota, and the third time, calculate the mobile directory
configuration update to the target directory.
I think some of the above three quota quota calculations are unnecessary. For
example, if all parent directories of the source directory and target directory
are not configured with quota, there is no need to calculate quotaCount. Even
if both the source directory and the target directory use quota, there is no
need to calculate the quota three times. The calculation logic for the first
and third times is the same, and it only needs to be calculated once.
Reporter: zhu
Assignee: zhu
Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]