Re: How to decrease the number of Mappers (not reducers) ?

Zheng Shao Tue, 25 Aug 2009 13:58:20 -0700

I guess you have a lot of small files in the table.
Can you merge those small files into bigger files?



Zheng

On Tue, Aug 25, 2009 at 1:08 PM, Ravi Jagannathan <
[email protected]> wrote:

>
>
>
>
> There are too many mappers in Hive. Table has approximately 50K rows,
> number of bytes = 5,654,500.
>
> the query is select count(1) from TABLE group by COLUMN
>
> There are only 2 nodes.
>
> On the Web UI I can see there are 1001 maps spawned, each of which takes 1
> sec to run. There are only 2 mappers running at a time, this means 10001 =
> 15 minutes seconds to run which is unacceptable.
>
> Thereafter the reduce> copy takes another 10 minutes. The reducers
> reduce>reduce finished very fast. How can I reduce the number of maps.
>
>
> Things I tried:
> I tried changing the hadoop-site.xml and restarting hive and hadoop server.
> But the map parameters mapred.map.tasks which I changed are not showing up
> in job.xml - as if Hive suppressed these changes. The python hive client
> does not allow a set command. I tried the cli set, but that has no effect
> either.
>
> Hadoop-0.19.1, hive 0.3
>



-- 
Yours,
Zheng

Re: How to decrease the number of Mappers (not reducers) ?

Reply via email to