Re: merging the size of the reduce output

Yongqiang He Sun, 13 Jun 2010 22:56:12 -0700

I think there is another parameter ³hive.merge.smallfiles.avgsize²  to see
whether to do the merge job or not based on the average output files¹ size.
The default for that parameter is 16M. So if the average output¹s size is
larger than 16M, will not merge.
Maybe you can try to increase that value to see.


Thanks
Yongqiang
On 6/13/10 10:41 PM, "Sammy Yu" <[email protected]> wrote:

> Hi,
>    I have both hive.merge.mapredfiles and hive.merge.mapredfiles set to true
> via the shell tool and hive-default.xml configuration file.  However, it
> appears somehow the job configuration is changed before the job is submitted.
>  Is there another condition that can cause this to happen?
> 
> Thanks,
> Sammy
>  
> 
> On Sun, Jun 13, 2010 at 7:39 AM, Ted Yu <[email protected]> wrote:
>> Looking at 
>> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java,
>> hive.merge.mapredfiles is effective if there is a reducer for your job.
>> Otherwise you should have set hive.merge.mapfiles to true.
>> 
>> 
>> On Sat, Jun 12, 2010 at 11:22 PM, Sammy Yu <[email protected]> wrote:
>>> Hi, 
>>>    I'm running the latest version of trunk r953172.  I'm doing doing a
>>> dynamic partition insert overwrite query which generates a lot of small
>>> files in each of the partition.  I was hoping this could be solved by
>>> setting hive.merge.mapredfiles to true.  However, it seems like whenever the
>>> job is submitted it is always set to false, thus it doesnt seem to have any
>>> effect.  I also tried to modified this property in the hive-default.xml, but
>>> it didn't work either. 
>>> 
>>> Thanks,
>>> Sammy
>>> 
>>> 
>> 
> 
> 
> 
>

Re: merging the size of the reduce output

Reply via email to