[
https://issues.apache.org/jira/browse/HIVE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888836#comment-17888836
]
Butao Zhang commented on HIVE-28523:
------------------------------------
[~liux] Please see my comment
[https://github.com/apache/hive/pull/5447#discussion_r1797634055]
I think your change is just useful for Hive3. For Hive4/master branch, your
change is useless & Hive4/master branch do not have this performance problem.
Thanks.
> Performance issues that may occur when tables or partitions are deleted
> ------------------------------------------------------------------------
>
> Key: HIVE-28523
> URL: https://issues.apache.org/jira/browse/HIVE-28523
> Project: Hive
> Issue Type: Improvement
> Security Level: Public(Viewable by anyone)
> Components: Standalone Metastore
> Reporter: liux
> Assignee: liux
> Priority: Major
> Labels: pull-request-available
> Attachments: ME1726238367718.jpg
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> 1. Traversal when deleting a table or partitions may have performance
> problems.
> Location: standalone - metastore/metastore -
> server/SRC/main/Java/org/apache/hadoop/hive/metastore/HMSHandler.java
> for (String partName : partNames) {
> Path partPath = wh.getDnsPath(new Path(pathString));
> }
> Assuming that wh.getDnsPath takes about 10 ms at a time, the traversal of a
> 20w partitioned object takes 33 minutes, which may result in large table
> deletion or partition timeout.
> 2. It is not necessary to execute the wh.getDnsPath(new Path(pathString))
> statement when traversing all partition names. It is only necessary to
> execute the statement when the partition is not a table subdirectory
--
This message was sent by Atlassian Jira
(v8.20.10#820010)