[ 
https://issues.apache.org/jira/browse/HIVE-25990?focusedWorklogId=739882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-739882
 ]

ASF GitHub Bot logged work on HIVE-25990:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Mar/22 06:06
            Start Date: 11/Mar/22 06:06
    Worklog Time Spent: 10m 
      Work Description: rbalamohan commented on a change in pull request #3058:
URL: https://github.com/apache/hive/pull/3058#discussion_r824412526



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -1496,7 +1499,13 @@ public static void mvFileToFinalPath(Path specPath, 
Configuration hconf,
           // for CTAS or Create MV statements
           perfLogger.perfLogBegin("FileSinkOperator", 
"moveSpecifiedFileStatus");
           LOG.debug("CTAS/Create MV: Files being renamed:  " + 
filesKept.toString());
-          moveSpecifiedFilesInParallel(hconf, fs, tmpPath, specPath, 
filesKept);
+          if (conf.getTable() != null && 
conf.getTable().getTableType().equals(TableType.EXTERNAL_TABLE)) {
+            // Do this optimisation only for External tables.
+            createFileList(filesKept, tmpPath, specPath, fs);
+          } else {
+            List<String> filesKeptPaths = filesKept.stream().map(x -> 
x.getPath().toString()).collect(Collectors.toList());

Review comment:
       filesKept is a set. Is it possible to retain as "set" after the mapping? 
(Something like Collectors.toSet()). if so, no need to change the signature of 
moveSpecifiedFilesInParallel?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 739882)
    Time Spent: 50m  (was: 40m)

> Optimise multiple copies in case of CTAS in external tables for Object stores
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-25990
>                 URL: https://issues.apache.org/jira/browse/HIVE-25990
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Presently for CTAS with external tables, there are two renames, operations, 
> one from tmp to _ext and then from _ext to actual target.
> In case of object stores, the renames lead to actual copy. Avoid renaming by 
> avoiding rename from tmp to _ext, but by creating a list of files to be 
> copied in that directly, which can be consumed in the move task, to copy 
> directly from tmp to actual target.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to