[ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076427#comment-17076427
 ] 

Ashutosh Chauhan commented on HIVE-23140:
-----------------------------------------

Previous code path in {{moveSpecifiedFileStatus}} also had the following logic
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1163
 This logic also triggers when we have existing files in target dir (not only 
because of runaway tasks). Can you please test for CTAS and insert into 
statements where there are existing files in table/partition dir, this new code 
path still triggers and works correctly (ie doesn't overwrite existing file)?

> Optimise file move in CTAS 
> ---------------------------
>
>                 Key: HIVE-23140
>                 URL: https://issues.apache.org/jira/browse/HIVE-23140
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to