Not sure if this got answered. The second MR job in this case is for 
concatenating the outputs so that the files generated are much less than the 
mapper parallelism. This has advantages for jobs that consume the data. This 
feature was added recently. You can however turn it off using the following 
configuration variable.

hive.merge.mapfiles=false

This is true by default.

Ashish
________________________________
From: Min Zhou [mailto:[email protected]]
Sent: Monday, August 03, 2009 8:02 PM
To: hive-user
Subject: why insert overwrite table tmp partition(dt=1) select bar, foo from 
pokes NEEDS 2 MR JOBS?

I thought one map only job is ok. try
hive> explain insert overwrite table tmp partition(dt=1) select bar, foo from 
pokes;


Thanks,
Min
--
My research interests are distributed systems, parallel computing and bytecode 
based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Reply via email to