Hi all,

We are using hive version 0.4.2. I think that version have a BUG for
mapjoin, which have been fixed in the current version.

The query is shown below:

------------------------
 SELECT
    t.query,
    t.query_flag,
    count(t.ipv) as ipv
    , t.ds
FROM(
        SELECT /*+ MAPJOIN(d) */
        c.query,
        c.query_flag,
        c.ipv,
        d.ssid_classified_id
        , c.ds
        FROM
        DM_FACT c
        LEFT OUTER JOIN
        DIM_SSID_MAP d
        ON
        ( c.ds=20100506 AND
        c.ss_id = d.ss_id)
    ) t
GROUP BY
    t.query,
    t.query_flag
    , t.ssid_classified_id
    , t.ds

---------------------------

The query above generates error results, looks like field mismatched. We
expect results like:

...
blah    00001    1    20100506
blahblah    00001    1    20100506
blah    00010    1    20100506
...

But in 0.4.2 it generates:

...
0.0    20100506    20    NULL
0.0    20100506    37    0
0.0    20100506    7    1
1.0    20100506    6    NULL
...


Also the results looks OK while removing *MapJoin* or *Group by* clause.

Someone please tell me which patch fixed that BUG, thanks.

-- 
Best Regards,
Ted Xu

Reply via email to