Not sure if this got answered. The second MR job in this case is for concatenating the outputs so that the files generated are much less than the mapper parallelism. This has advantages for jobs that consume the data. This feature was added recently. You can however turn it off using the following configuration variable.
hive.merge.mapfiles=false This is true by default. Ashish ________________________________ From: Min Zhou [mailto:[email protected]] Sent: Monday, August 03, 2009 8:02 PM To: hive-user Subject: why insert overwrite table tmp partition(dt=1) select bar, foo from pokes NEEDS 2 MR JOBS? I thought one map only job is ok. try hive> explain insert overwrite table tmp partition(dt=1) select bar, foo from pokes; Thanks, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
