Re: Dealing with large number of partitions

wd Thu, 10 Jun 2010 23:36:47 -0700

Try set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
before you query, this may be help.




2010/6/11 Sammy Yu <[email protected]>

> Hi,
>    I am having an issue with a large number of 4000 partitions (each being
> very small <10k files).  Any queries that I do which involve these
> partitions take an extremely long time to complete (10+ hours), I was
> wondering if there was any easy way in hive without having to merge the
> files improve it's performance.  I can see the map reduce jobs are taking a
> long time due to the fact that there are so many separated raw data files
> that need to be read.  I saw that HIVE-1332 dealt with using HAR files for
> partitioning.  Could this perhaps help performance rather than hurt it,
> given that the queries will be using all the partitions in the har file?
>
> Thanks,
> Sammy
>
>
>
>
>
>

Re: Dealing with large number of partitions

Reply via email to