[
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766299#comment-17766299
]
ASF GitHub Bot commented on HDFS-16000:
---------------------------------------
zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328469427
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##########
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
}
} finally {
if (undoRemoveSrc) {
- tx.restoreSource();
+ tx.restoreSource(srcStoragePolicyCounts);
}
if (undoRemoveDst) { // Rename failed - restore dst
- tx.restoreDst(bsps);
+ tx.restoreDst(bsps, dstStoragePolicyCounts);
}
}
NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
"failed to rename " + src + " to " + dst);
throw new IOException("rename from " + src + " to " + dst + " failed.");
}
+ /*
+ * Calculate QuotaCounts based on parent directory and storage policy
+ * 1. If the storage policy of src and dst are different,
+ * calculate the QuotaCounts of src and dst respectively.
+ * 2. If all parent nodes of src and dst are not set with Quota,
+ * there is no need to calculate QuotaCount.
+ * 3. if parent nodes of src and dst have Quota configured,
+ * the QuotaCount is calculated once using the storage policy of src.
+ * */
+ private static void computeQuotaCounts(
+ QuotaCounts srcStoragePolicyCounts,
+ QuotaCounts dstStoragePolicyCounts,
+ INodesInPath srcIIP,
+ INodesInPath dstIIP,
+ BlockStoragePolicySuite bsps,
+ RenameOperation tx) {
+ INode dstParent = dstIIP.getINode(-2);
+ INode srcParentNode = FSDirectory.
+ getFirstSetQuotaParentNode(srcIIP);
+ INode srcInode = srcIIP.getLastINode();
+ INode dstParentNode = FSDirectory.
+ getFirstSetQuotaParentNode(dstIIP);
+ byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+ byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+ if (srcStoragePolicyID != dstStoragePolicyID) {
+ srcStoragePolicyCounts.add(srcIIP.getLastINode().
+ computeQuotaUsage(bsps));
+ dstStoragePolicyCounts.add(srcIIP.getLastINode()
+ .computeQuotaUsage(bsps, dstParent.getStoragePolicyID(), false,
+ Snapshot.CURRENT_STATE_ID));
+ } else if (srcParentNode != dstParentNode || tx.withCount != null) {
+
srcStoragePolicyCounts.add(srcIIP.getLastINode().computeQuotaUsage(bsps));
+ dstStoragePolicyCounts.add(srcStoragePolicyCounts);
+ }
Review Comment:
If this is the case, it can be understood that src and dst have a configured
quota, and src is the isSrcInSnapshot attribute.
> HDFS : Rename performance optimization
> --------------------------------------
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs, namenode
> Affects Versions: 3.1.4, 3.3.1
> Reporter: Xiangyi Zhu
> Assignee: Xiangyi Zhu
> Priority: Major
> Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg,
> HDFS-16000.patch
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it
> takes about 40 seconds to move a 1000W directory. When a large amount of data
> is deleted to the trash, the move large directory will occur when the recycle
> bin makes checkpoint. In addition, the user may also actively trigger the
> move large directory operation, which will cause the NameNode to lock too
> long and be killed by Zkfc. Through the flame graph, it is found that the
> main time consuming is to create the EnumCounters object.
>
> h3. Rename logic optimization:
> * Regardless of whether the rename operation is the source directory and the
> target directory, the quota count must be calculated three times. The first
> time, check whether the moved directory exceeds the target directory quota,
> the second time, calculate the mobile directory quota to update the source
> directory quota, and the third time, calculate the mobile directory
> configuration update to the target directory.
> * I think some of the above three quota quota calculations are unnecessary.
> For example, if all parent directories of the source directory and target
> directory are not configured with quota, there is no need to calculate
> quotaCount. Even if both the source directory and the target directory use
> quota, there is no need to calculate the quota three times. The calculation
> logic for the first and third times is the same, and it only needs to be
> calculated once.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]