Hi all,
We are using hive version 0.4.2. I think that version have a BUG for
mapjoin, which have been fixed in the current version.
The query is shown below:
------------------------
SELECT
t.query,
t.query_flag,
count(t.ipv) as ipv
, t.ds
FROM(
SELECT /*+ MAPJOIN(d) */
c.query,
c.query_flag,
c.ipv,
d.ssid_classified_id
, c.ds
FROM
DM_FACT c
LEFT OUTER JOIN
DIM_SSID_MAP d
ON
( c.ds=20100506 AND
c.ss_id = d.ss_id)
) t
GROUP BY
t.query,
t.query_flag
, t.ssid_classified_id
, t.ds
---------------------------
The query above generates error results, looks like field mismatched. We
expect results like:
...
blah 00001 1 20100506
blahblah 00001 1 20100506
blah 00010 1 20100506
...
But in 0.4.2 it generates:
...
0.0 20100506 20 NULL
0.0 20100506 37 0
0.0 20100506 7 1
1.0 20100506 6 NULL
...
Also the results looks OK while removing *MapJoin* or *Group by* clause.
Someone please tell me which patch fixed that BUG, thanks.
--
Best Regards,
Ted Xu