I don't think hash table generation is needed for SMB joins. Could you check the result of explain extended?
Thanks, Navis 2014-07-31 4:08 GMT+09:00 Pala M Muthaia <mchett...@rocketfuelinc.com>: > +hive-users > > > On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia < > mchett...@rocketfuelinc.com > > wrote: > > > Hi, > > > > I am testing SMB join for 2 large tables. The tables are bucketed and > > sorted on the join column. I notice that even though the table is large, > > Hive attempts to generate hash table for the 'small' table locally, > > similar to map join. Since the table is large in my case, the client > runs > > out of memory and the query fails. > > > > I am using Hive 0.12 with the following settings: > > > > set hive.optimize.bucketmapjoin=true; > > set hive.optimize.bucketmapjoin.sortedmerge=true; > > set hive.input.format = > > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > > > > My test query does a simple join and a select, no subqueries/nested > > queries etc. > > > > I understand why a (bucket) map join requires hash table generation, but > > why is that included for an SMB join? Shouldn't a SMB join just spin up > one > > mapper for each bucket and perform a sort merge join directly on the > mapper? > > > > > > Thanks, > > pala > > > > > > > > >