Re: Why does SMB join generate hash table locally, even if input tables are large?

Navis류승우 Sun, 03 Aug 2014 19:22:26 -0700

I don't think hash table generation is needed for SMB joins. Could you
check the result of explain extended?


Thanks,
Navis


2014-07-31 4:08 GMT+09:00 Pala M Muthaia <mchett...@rocketfuelinc.com>:

> +hive-users
>
>
> On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia <
> mchett...@rocketfuelinc.com
> > wrote:
>
> > Hi,
> >
> > I am testing SMB join for 2 large tables. The tables are bucketed and
> > sorted on the join column. I notice that even though the table is large,
> > Hive attempts to generate hash table for the 'small' table locally,
> >  similar to map join. Since the table is large in my case, the client
> runs
> > out of memory and the query fails.
> >
> > I am using Hive 0.12 with the following settings:
> >
> > set hive.optimize.bucketmapjoin=true;
> > set hive.optimize.bucketmapjoin.sortedmerge=true;
> > set hive.input.format =
> > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> >
> > My test query does a simple join and a select, no subqueries/nested
> > queries etc.
> >
> > I understand why a (bucket) map join requires hash table generation, but
> > why is that included for an SMB join? Shouldn't a SMB join just spin up
> one
> > mapper for each bucket and perform a sort merge join directly on the
> mapper?
> >
> >
> > Thanks,
> > pala
> >
> >
> >
> >
>

Re: Why does SMB join generate hash table locally, even if input tables are large?

Reply via email to