>From a previous thread I learned about map side joins. This sounds like
>exactly what we need since we are typically joining several small tables to a
>larger fact table. However, I'm getting some NPEs when trying to use them in
>a query.
The tables look something like:
hive> describe dim_1;
OK
key int
dim_2_key int
startdate string
enddate string
lastmodified string
hive> describe dim_2;
OK
key int
dim_3_key int
name string
<...>
lastmodified string
hive> describe dim_3;
OK
key int
name string
description string
lastmodified string
So there is a one-to-many relationship from dim_3 to dim_2, and from dim_2 to
dim_1. Each table contains a few thousand rows. The fact table contains many
millions of rows and looks something like:
hive> describe fact;
OK
fact_key string
measure1 int
measure2 int
measure3 double
dim_1_key int
day string
hour int
Not every record in the fact table has a dim_1_key. That is, it is sometimes
null.
The query I'm trying to run looks something like:
select /*+ MAPJOIN(dim_1, dim_2, dim_3) */ dim_3.name, sum(measure1),
sum(measure2)
from fact join dim_1 on fact.dim_1_key = dim_1.key
join dim_2 on dim_2.key = dim_1.dim_2_key
join dim_3 on dim_3.key = dim_2.dim_3_key
where fact.day='20090601' and fact.hour = 1
group by dim_3.name;
And finally the error I'm getting in some of the mappers is:
java.lang.RuntimeException: Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:208)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.close(MapJoinOperator.java:333)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:383)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:383)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:188)
... 3 more
Any help greatly appreciated!
Jason