[ https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138824#comment-14138824 ]
Rui Li commented on HIVE-8043: ------------------------------ Hi [~xuefuz], The DDL task that merges files is an alter table statement: {code} ALTER TABLE tbl CONCATENATE; {code} In this case, the DDL task creates a {{MergeFileTask}} and {{MergeFileTask}} launches an MR job to merge the files. This feature currently only supports RC/Orc tables. Strange thing is that I didn't find anything about this in the wiki or other official doc. Maybe I'm missing something? The main problem I see here is that, ideally we should launch the job according to the execution engine. But DDL task uses a different semantic analyzer {{DDLSemanticAnalyzer}}, and always launches an MR job. I think Tez doesn't handle this either. > Support merging small files [Spark Branch] > ------------------------------------------ > > Key: HIVE-8043 > URL: https://issues.apache.org/jira/browse/HIVE-8043 > Project: Hive > Issue Type: Task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Rui Li > Labels: Spark-M1 > Attachments: HIVE-8043.1-spark.patch > > > Hive currently supports merging small files with MR as the execution engine. > There are options available for this, such as > {code} > hive.merge.mapfiles > hive.merge.mapredfiles > {code} > Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we > might need a little more research and design on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)