[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138824#comment-14138824
 ] 

Rui Li commented on HIVE-8043:
------------------------------

Hi [~xuefuz],

The DDL task that merges files is an alter table statement:
{code}
ALTER TABLE tbl CONCATENATE;
{code}
In this case, the DDL task creates a {{MergeFileTask}} and {{MergeFileTask}} 
launches an MR job to merge the files. This feature currently only supports 
RC/Orc tables.

Strange thing is that I didn't find anything about this in the wiki or other 
official doc. Maybe I'm missing something?

The main problem I see here is that, ideally we should launch the job according 
to the execution engine. But DDL task uses a different semantic analyzer 
{{DDLSemanticAnalyzer}}, and always launches an MR job. I think Tez doesn't 
handle this either.

> Support merging small files [Spark Branch]
> ------------------------------------------
>
>                 Key: HIVE-8043
>                 URL: https://issues.apache.org/jira/browse/HIVE-8043
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>              Labels: Spark-M1
>         Attachments: HIVE-8043.1-spark.patch
>
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to