I think there is another parameter ³hive.merge.smallfiles.avgsize² to see whether to do the merge job or not based on the average output files¹ size. The default for that parameter is 16M. So if the average output¹s size is larger than 16M, will not merge. Maybe you can try to increase that value to see.
Thanks Yongqiang On 6/13/10 10:41 PM, "Sammy Yu" <[email protected]> wrote: > Hi, > I have both hive.merge.mapredfiles and hive.merge.mapredfiles set to true > via the shell tool and hive-default.xml configuration file. However, it > appears somehow the job configuration is changed before the job is submitted. > Is there another condition that can cause this to happen? > > Thanks, > Sammy > > > On Sun, Jun 13, 2010 at 7:39 AM, Ted Yu <[email protected]> wrote: >> Looking at >> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java, >> hive.merge.mapredfiles is effective if there is a reducer for your job. >> Otherwise you should have set hive.merge.mapfiles to true. >> >> >> On Sat, Jun 12, 2010 at 11:22 PM, Sammy Yu <[email protected]> wrote: >>> Hi, >>> I'm running the latest version of trunk r953172. I'm doing doing a >>> dynamic partition insert overwrite query which generates a lot of small >>> files in each of the partition. I was hoping this could be solved by >>> setting hive.merge.mapredfiles to true. However, it seems like whenever the >>> job is submitted it is always set to false, thus it doesnt seem to have any >>> effect. I also tried to modified this property in the hive-default.xml, but >>> it didn't work either. >>> >>> Thanks, >>> Sammy >>> >>> >> > > > >
