Right now, mapjoin is not fully optimized - it is the expected behavior. MapJoin writes the results in a temp file, and then the order by is processed of that file.
Thanks, -namit -----Original Message----- From: Sarah Sproehnle [mailto:[email protected]] Sent: Friday, April 30, 2010 5:25 PM To: [email protected] Subject: mapjoin execution plan Hi, I am confused by the execution plan for a query. First I did: SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a) ORDER BY t1.b; As expected, EXPLAIN reported that there would be 2 MR stages (one for the reduce-side join and one for the order by). So I added a MAPJOIN(t1) hint and expected a single MR stage, but what I got was (I think) a map-only job and a map-reduce job. Is this normal? Explain plan: http://pastebin.com/pEyT22vC Thanks, Sarah -- get hadoop: cloudera.com/hadoop online training: cloudera.com/hadoop-training blog: cloudera.com/blog twitter: twitter.com/cloudera
