Luis E Martinez-Poblete created IMPALA-7957:
-----------------------------------------------

             Summary: UNION ALL query returns incorrect results
                 Key: IMPALA-7957
                 URL: https://issues.apache.org/jira/browse/IMPALA-7957
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.12.0
            Reporter: Luis E Martinez-Poblete


Synopsis:
=========
UNION ALL query returns incorrect results

Problem:
========
Customer reported a UNION ALL query returning incorrect results. The UNION ALL 
query has 2 legs, but Impala is only returning information from one leg.

Issue can be reproduced in the latest version of Impala. Below is the 
reproduction case:

create table mytest_t (c1 timestamp, c2 timestamp, c3 int, c4 int);
insert into mytest_t values (now(), ADDDATE (now(),1), 1,1);
insert into mytest_t values (now(), ADDDATE (now(),1), 2,2);
insert into mytest_t values (now(), ADDDATE (now(),1), 3,3);

SELECT t.c1
FROM
 (SELECT c1, c2
 FROM mytest_t) t
LEFT JOIN
 (SELECT c1, c2
 FROM mytest_t
 WHERE c2 = c1) t2 ON (t.c2 = t2.c2)
UNION ALL
VALUES (NULL)


The above query produces the following execution plan:

+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=34.02MB Threads=5 |
| Per-Host Resource Estimates: Memory=2.06GB |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| default.mytest_t |
| |
| PLAN-ROOT SINK |
| | |
| 06:EXCHANGE [UNPARTITIONED] |
| | |
| 00:UNION |
| | constant-operands=1 |
| | |
| 04:SELECT |
| | predicates: default.mytest_t.c1 = default.mytest_t.c2 |
| | |
| 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST] |
| | hash predicates: c2 = c2 |
| | |
| |--05:EXCHANGE [BROADCAST] |
| | | |
| | 02:SCAN HDFS [default.mytest_t] |
| | partitions=1/1 files=3 size=192B |
| | predicates: c2 = c1 |
| | |
| 01:SCAN HDFS [default.mytest_t] |
| partitions=1/1 files=3 size=192B |
+------------------------------------------------------------------------------------+

The issue is in operator 4:

| 04:SELECT |
| | predicates: default.mytest_t.c1 = default.mytest_t.c2 |

It's definitely a bug with predicate placement - that c1 = c2 predicate 
shouldn't be evaluated outside the right branch of the LEFT JOIN.

Thanks,
Luis Martinez.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to