Hi,
Thanks for the quick response. I tried the query:
insert overwrite table join_result
select /*+ MAPJOIN(m)*/ m.mid, m.param, r.rating
from data r JOIN param m ON (r.mid = m.mid);
param has only 17k rows with 2 columns.
I got this exception
java.lang.RuntimeException
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:182)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.createForwardJoinObject(CommonJoinOperator.java:283)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:530)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:519)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:519)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:560)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:299)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:374)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:580)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:42)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:374)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:580)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:320)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:165)
... 3 more
Additionally, the query compiled into two MR jobs. The 2nd one didn't start
because the first failed, but I couldn't reason about the 2nd job.
I am using Hive trunk, revision 811082 updated on 09/03.
Thanks
Sudipto
PhD Candidate
CS @ UCSB
Santa Barbara, CA 93106, USA
http://www.cs.ucsb.edu/~sudipto
On Wed, Sep 9, 2009 at 2:03 PM, Namit Jain <[email protected]> wrote:
> You can specify it as a hint in the select list:
>
>
>
>
>
> select /*+ MAPJOIN(b) */ … from T a JOIN T2 b on …
>
>
>
>
>
> In the example above, T2 is the small table which can be cached in memory
>
>
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:[email protected]] *On Behalf Of
> *Sudipto
> Das
> *Sent:* Wednesday, September 09, 2009 2:01 PM
> *To:* [email protected]
> *Subject:* Directing Hive to perform Hash Join for small inner tables
>
>
>
> Hi,
>
> I am new to hive so pardon me if this is something very obvious which I
> might have missed in the documentation.
>
> I have an application where I am joining a small inner table with a really
> large outer table. The inner table is small enough to fit into memory at
> each mapper. In such a case, putting the inner table into an in-memory hash
> table and performing a hash based join is much more efficient than
> performing the sort-merge join which the JOIN operator selects. Is there a
> way in Hive where I can instruct it perform the hash based join?
>
> Thanks
>
> Sudipto
>
> PhD Candidate
> CS @ UCSB
> Santa Barbara, CA 93106, USA
> http://www.cs.ucsb.edu/~sudipto <http://www.cs.ucsb.edu/%7Esudipto>
>