There are too many mappers in Hive. Table has approximately 50K rows, number of 
bytes = 5,654,500.
the query is select count(1) from TABLE group by COLUMN
There are only 2 nodes.
On the Web UI I can see there are 1001 maps spawned, each of which takes 1 sec 
to run. There are only 2 mappers running at a time, this means 10001 = 15 
minutes seconds to run which is unacceptable.
Thereafter the reduce> copy takes another 10 minutes. The reducers 
reduce>reduce finished very fast. How can I reduce the number of maps.

Things I tried:
I tried changing the hadoop-site.xml and restarting hive and hadoop server. But 
the map parameters mapred.map.tasks which I changed are not showing up in 
job.xml - as if Hive suppressed these changes. The python hive client does not 
allow a set command. I tried the cli set, but that has no effect either.
Hadoop-0.19.1, hive 0.3

Reply via email to