There are too many mappers in Hive. Table has approximately 50K rows, number of bytes = 5,654,500. the query is select count(1) from TABLE group by COLUMN There are only 2 nodes. On the Web UI I can see there are 1001 maps spawned, each of which takes 1 sec to run. There are only 2 mappers running at a time, this means 10001 = 15 minutes seconds to run which is unacceptable. Thereafter the reduce> copy takes another 10 minutes. The reducers reduce>reduce finished very fast. How can I reduce the number of maps.
Things I tried: I tried changing the hadoop-site.xml and restarting hive and hadoop server. But the map parameters mapred.map.tasks which I changed are not showing up in job.xml - as if Hive suppressed these changes. The python hive client does not allow a set command. I tried the cli set, but that has no effect either. Hadoop-0.19.1, hive 0.3
