[ https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137618#comment-14137618 ]
Xuefu Zhang commented on HIVE-8043: ----------------------------------- [~lirui] Thanks for your detailed analysis. I think we need to verify the following: 1. File merging (either thru DLL or hive settings) needs to work for all data formats regardless executioin engine type. That includes RC, ORC, and other formats. Please verify that with Spark, file merging works. If not, check MR. 2. The improvement made in HIVE-7704 might be Tez only. If this the case, please identify the work that needs to be done to support that, but we don't have to implement it now, as it's an optimization, which can be done in later milestones. Thanks. > Support merging small files [Spark Branch] > ------------------------------------------ > > Key: HIVE-8043 > URL: https://issues.apache.org/jira/browse/HIVE-8043 > Project: Hive > Issue Type: Task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Rui Li > Labels: Spark-M1 > > Hive currently supports merging small files with MR as the execution engine. > There are options available for this, such as > {code} > hive.merge.mapfiles > hive.merge.mapredfiles > {code} > Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we > might need a little more research and design on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)