[jira] [Created] (HIVE-14272) ConditionalResolverMergeFiles should keep staging data on HDFS, then copy (no rename) to S3

JIRA Mon, 18 Jul 2016 15:00:43 -0700

Sergio Peña created HIVE-14272:
----------------------------------

             Summary: ConditionalResolverMergeFiles should keep staging data on 
HDFS, then copy (no rename) to S3
                 Key: HIVE-14272
                 URL: https://issues.apache.org/jira/browse/HIVE-14272
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergio Peña
            Assignee: Sergio Peña



If {{hive.merge.mapfiles}} is True, and the output table to write is on S3, 
then Hive will generate a conditional plan where smaller files will be merged 
into larger sizes. 

If the output files written by the initial MR job are small, then a second MR 
job is run to merge the output into larger files (a copy from S3 to S3 in the 
current code).

If the original output files are large enough, then the conditional task is 
followed by a move/rename which is very expensive in S3.

We should keep staging data on HDFS previous to copying them to S3 as final 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14272) ConditionalResolverMergeFiles should keep staging data on HDFS, then copy (no rename) to S3

Reply via email to