[
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhu updated HDFS-16000:
-----------------------
Attachment: HDFS-16000.patch
> HDFS : Rename performance optimization
> --------------------------------------
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs, namenode
> Affects Versions: 3.1.4, 3.3.1
> Reporter: zhu
> Assignee: zhu
> Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg,
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it
> takes about 40 seconds to move a 1000W directory. When a large amount of data
> is deleted to the trash, the move large directory will occur when the recycle
> bin makes checkpoint. In addition, the user may also actively trigger the
> move large directory operation, which will cause the NameNode to lock too
> long and be killed by Zkfc. Through the flame graph, it is found that the
> main time consuming is to create the EnumCounters object.
> *I think the following two points can optimize the efficiency of rename
> execution*
> *
> h3. *QuotaCount calculation time-consuming optimization:*
> #
> ## Create a QuotaCounts object in the calculation directory quotaCount, and
> pass the quotaCount to the next calculation function through a parameter each
> time, so as to avoid creating an EnumCounters object for each calculation.
> ## In addition, through the flame graph, it is found that using lambda to
> modify QuotaCounts takes longer than the ordinary method, so the ordinary
> method is used to modify the QuotaCounts count.
> *
> h3. Rename logic optimization:
> #
> ## Regardless of whether the rename operation is the source directory and
> the target directory, the quota count must be calculated three times. The
> first time, check whether the moved directory exceeds the target directory
> quota, the second time, calculate the mobile directory quota to update the
> source directory quota, and the third time, calculate the mobile directory
> configuration update to the target directory.
> I think some of the above three quota quota calculations are unnecessary. For
> example, if all parent directories of the source directory and target
> directory are not configured with quota, there is no need to calculate
> quotaCount. Even if both the source directory and the target directory use
> quota, there is no need to calculate the quota three times. The calculation
> logic for the first and third times is the same, and it only needs to be
> calculated once.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]