[ 
https://issues.apache.org/jira/browse/HIVE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888836#comment-17888836
 ] 

Butao Zhang commented on HIVE-28523:
------------------------------------

[~liux]  Please see my comment 
[https://github.com/apache/hive/pull/5447#discussion_r1797634055]

I think your change is just useful for Hive3. For Hive4/master branch, your 
change is useless & Hive4/master branch do not have this performance problem.

Thanks.

> Performance issues that may occur when  tables or partitions are deleted
> ------------------------------------------------------------------------
>
>                 Key: HIVE-28523
>                 URL: https://issues.apache.org/jira/browse/HIVE-28523
>             Project: Hive
>          Issue Type: Improvement
>      Security Level: Public(Viewable by anyone) 
>          Components: Standalone Metastore
>            Reporter: liux
>            Assignee: liux
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: ME1726238367718.jpg
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1. Traversal when deleting a table or partitions may have performance 
> problems.
> Location: standalone - metastore/metastore - 
> server/SRC/main/Java/org/apache/hadoop/hive/metastore/HMSHandler.java
> for (String partName : partNames) {    
> Path partPath = wh.getDnsPath(new Path(pathString));
> }
> Assuming that wh.getDnsPath takes about 10 ms at a time, the traversal of a 
> 20w partitioned object takes 33 minutes, which may result in large table 
> deletion or partition timeout.
> 2. It is not necessary to execute the wh.getDnsPath(new Path(pathString)) 
> statement when traversing all partition names. It is only necessary to 
> execute the statement when the partition is not a table subdirectory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to