This should be fixed as part of

https://issues.apache.org/jira/browse/HIVE-405



Can you try in the trunk version and let us know if you still see any problems ?



From: Jason Michael [mailto:[email protected]]
Sent: Tuesday, July 14, 2009 3:01 PM
To: hive mailing list
Subject: NPEs when using map-side join

>From a previous thread I learned about map side joins.  This sounds like 
>exactly what we need since we are typically joining several small tables to a 
>larger fact table.  However, I'm getting some NPEs when trying to use them in 
>a query.

The tables look something like:

hive> describe dim_1;
OK
key    int
dim_2_key    int
startdate    string
enddate    string
lastmodified    string

hive> describe dim_2;
OK
key    int
dim_3_key    int
name    string
<...>
lastmodified    string

hive> describe dim_3;
OK
key    int
name    string
description    string
lastmodified    string

So there is a one-to-many relationship from dim_3 to dim_2, and from dim_2 to 
dim_1.  Each table contains a few thousand rows.  The fact table contains many 
millions of rows and looks something like:

hive> describe fact;
OK
fact_key    string
measure1    int
measure2    int
measure3    double
dim_1_key    int
day    string
hour    int

Not every record in the fact table has a dim_1_key.  That is, it is sometimes 
null.

The query I'm trying to run looks something like:

select /*+ MAPJOIN(dim_1, dim_2, dim_3) */ dim_3.name, sum(measure1), 
sum(measure2)
from fact join dim_1 on fact.dim_1_key = dim_1.key
    join dim_2 on dim_2.key = dim_1.dim_2_key
    join dim_3 on dim_3.key = dim_2.dim_3_key
where fact.day='20090601' and fact.hour = 1
group by dim_3.name;

And finally the error I'm getting in some of the mappers is:

java.lang.RuntimeException: Error while closing operators
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:208)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
Caused by: java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.close(MapJoinOperator.java:333)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:383)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:383)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:188)
    ... 3 more

Any help greatly appreciated!

Jason

Reply via email to