[
https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401900#comment-15401900
]
Rajesh Balamohan commented on HIVE-14128:
-----------------------------------------
[~ashutoshc] - In non-partitioned case, there can be multiple part files within
the temp directory. When this is moved in HDFS, it would be simpler. But in
some file systems like S3, it would turn out to be expensive still. E.g
lineitem is a non-partitioned dataset in TPC-H. Simple insert overwrite would
have the following move at the end of the job. Please note that this
internally has 300+ part files. So it rename would turn out to be expensive
here.
{noformat}
2016-08-01T04:40:00,154 INFO [JobClose-Thread-0] exec.FileSinkOperator: Moving
tmp dir:
s3a://bucket/lineitem/.hive-staging_hive_2016-08-01_04-31-26_432_5317262787271448273-1/_tmp.-ext-10000
to:
s3a://bucket/lineitem/.hive-staging_hive_2016-08-01_04-31-26_432_5317262787271448273-1/-ext-10000
{noformat}
Should we consider a file by file move in such cases?
> Parallelize jobClose phases
> ---------------------------
>
> Key: HIVE-14128
> URL: https://issues.apache.org/jira/browse/HIVE-14128
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 1.2.0, 2.0.0, 2.1.0
> Reporter: Ashutosh Chauhan
> Assignee: Ashutosh Chauhan
> Attachments: HIVE-14128.1.patch, HIVE-14128.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)