RE: Dealing with large number of partitions

Ashish Thusoo Fri, 11 Jun 2010 16:17:33 -0700

+1 to that. That should help provided you are running hadoop 0.20 ..

Ashish

________________________________
From: wd [mailto:[email protected]]
Sent: Thursday, June 10, 2010 11:36 PM
To: [email protected]
Subject: Re: Dealing with large number of partitions

Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; 
before you query, this may be help.

2010/6/11 Sammy Yu <[email protected]<mailto:[email protected]>>
Hi,
   I am having an issue with a large number of 4000 partitions (each being very 
small <10k files).  Any queries that I do which involve these partitions take 
an extremely long time to complete (10+ hours), I was wondering if there was 
any easy way in hive without having to merge the files improve it's 
performance.  I can see the map reduce jobs are taking a long time due to the 
fact that there are so many separated raw data files that need to be read.  I 
saw that HIVE-1332 dealt with using HAR files for partitioning.  Could this 
perhaps help performance rather than hurt it, given that the queries will be 
using all the partitions in the har file?

Thanks,
Sammy

RE: Dealing with large number of partitions

Reply via email to