[
https://issues.apache.org/jira/browse/HIVE-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987441#comment-16987441
]
mahesh kumar behera commented on HIVE-22548:
--------------------------------------------
[[email protected]]
The directory listing is required by the caller. Earlier there were two calls
to list status. Now it's merged to one list status. The directory listing done
in removeEmptyDpDirectory is used by removeTempOrDuplicateFiles. The directory
listing is kept in removeEmptyDpDirectory and is called in parallel for
multiple partitions to reduce execution time.
> Optimise Utilities.removeTempOrDuplicateFiles when moving files to final
> location
> ---------------------------------------------------------------------------------
>
> Key: HIVE-22548
> URL: https://issues.apache.org/jira/browse/HIVE-22548
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Affects Versions: 3.1.2
> Reporter: Rajesh Balamohan
> Assignee: mahesh kumar behera
> Priority: Major
> Attachments: HIVE-22548.01.patch
>
>
> {{Utilities.removeTempOrDuplicateFiles}}
> is very slow with cloud storage, as it executes {{listStatus}} twice and also
> runs in single threaded mode.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1629
--
This message was sent by Atlassian Jira
(v8.3.4#803005)