[ 
https://issues.apache.org/jira/browse/HIVE-18206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-18206:
-------------------------------
    Description: 
Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the 
average file after merge stage.

But we found it only work for file format like {{Textfile/Seqencefile}}. With 
{{RC/ORC}} file format, it {{does not work}}.

For {{RC/ORC}} file format, we found instead the file size after merge stage, 
depends on parameter like {{mapreduce.input.fileinputformat.split.maxsize}.

it is better to use {{hive.merge.size.per.task}} to decide the the average file 
size for RC/ORC fileformat, which results in unifying.

Root Cause is for RC/ORC file format, {{MergeFileTask}} just has not accept the 
configuration value in MergeFileWork, so the solution is passing it into  
{{MergeFileTask}}



  was:
Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the 
average file after merge stage.

But we found it only work for format like {{Textfile/Seqencefile}}, for 
{{RC/ORC}} file format, it does not work.

For {{RC/ORC}} file format, we found {{hive.merge.size.per.task}} does not 
work, instead the split size parameter like 
{{mapreduce.input.fileinputformat.split.maxsize}} just works.

it is better to use {{hive.merge.size.per.task}} to decide the the average file 
size.




> Merge of RC/ORC file should follow other fileformate which use merge 
> configuration parameter
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18206
>                 URL: https://issues.apache.org/jira/browse/HIVE-18206
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Wang Haihua
>
> Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the 
> average file after merge stage.
> But we found it only work for file format like {{Textfile/Seqencefile}}. With 
> {{RC/ORC}} file format, it {{does not work}}.
> For {{RC/ORC}} file format, we found instead the file size after merge stage, 
> depends on parameter like {{mapreduce.input.fileinputformat.split.maxsize}.
> it is better to use {{hive.merge.size.per.task}} to decide the the average 
> file size for RC/ORC fileformat, which results in unifying.
> Root Cause is for RC/ORC file format, {{MergeFileTask}} just has not accept 
> the configuration value in MergeFileWork, so the solution is passing it into  
> {{MergeFileTask}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to