Right now, mapjoin is not fully optimized - it is the expected behavior.
MapJoin writes the results in a temp file, and then the order by is processed 
of that file.


Thanks,
-namit


-----Original Message-----
From: Sarah Sproehnle [mailto:[email protected]] 
Sent: Friday, April 30, 2010 5:25 PM
To: [email protected]
Subject: mapjoin execution plan

Hi,

I am confused by the execution plan for a query.  First I did:
SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a) ORDER BY t1.b;

As expected, EXPLAIN reported that there would be 2 MR stages (one for
the reduce-side join and one for the order by).

So I added a MAPJOIN(t1) hint and expected a single MR stage, but what
I got was (I think) a map-only job and a map-reduce job.  Is this
normal?

Explain plan: http://pastebin.com/pEyT22vC

Thanks,
Sarah
-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera

Reply via email to