Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; before you query, this may be help.
2010/6/11 Sammy Yu <[email protected]> > Hi, > I am having an issue with a large number of 4000 partitions (each being > very small <10k files). Any queries that I do which involve these > partitions take an extremely long time to complete (10+ hours), I was > wondering if there was any easy way in hive without having to merge the > files improve it's performance. I can see the map reduce jobs are taking a > long time due to the fact that there are so many separated raw data files > that need to be read. I saw that HIVE-1332 dealt with using HAR files for > partitioning. Could this perhaps help performance rather than hurt it, > given that the queries will be using all the partitions in the har file? > > Thanks, > Sammy > > > > > >
