I guess you have a lot of small files in the table. Can you merge those small files into bigger files?
Zheng On Tue, Aug 25, 2009 at 1:08 PM, Ravi Jagannathan < [email protected]> wrote: > > > > > There are too many mappers in Hive. Table has approximately 50K rows, > number of bytes = 5,654,500. > > the query is select count(1) from TABLE group by COLUMN > > There are only 2 nodes. > > On the Web UI I can see there are 1001 maps spawned, each of which takes 1 > sec to run. There are only 2 mappers running at a time, this means 10001 = > 15 minutes seconds to run which is unacceptable. > > Thereafter the reduce> copy takes another 10 minutes. The reducers > reduce>reduce finished very fast. How can I reduce the number of maps. > > > Things I tried: > I tried changing the hadoop-site.xml and restarting hive and hadoop server. > But the map parameters mapred.map.tasks which I changed are not showing up > in job.xml - as if Hive suppressed these changes. The python hive client > does not allow a set command. I tried the cli set, but that has no effect > either. > > Hadoop-0.19.1, hive 0.3 > -- Yours, Zheng
