[
https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668899#comment-16668899
]
Tim Armstrong commented on IMPALA-7782:
---------------------------------------
Looks like a bad rewrite in the planner. The plan doesn't make any sense to me,
it seems like it did some incorrect optimisation based on the subquery being
empty:
{noformat}
**
[localhost:21000] default> use functional;
Query: use functional
[localhost:21000] functional> explain
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM
alltypestiny HAVING false);
Query: explain SELECT id
FROM alltypestiny
WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false)
+------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=4 |
| Per-Host Resource Estimates: Memory=32MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 04:EXCHANGE [UNPARTITIONED] |
| | |
| 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
| | |
| |--03:EXCHANGE [BROADCAST] |
| | | |
| | 01:EMPTYSET |
| | |
| 00:SCAN HDFS [functional.alltypestiny] |
| partitions=4/4 files=4 size=460B |
+------------------------------------------------------------+
{noformat}
> discrepancy in results with a subquery containing an agg that produces an
> empty set
> -----------------------------------------------------------------------------------
>
> Key: IMPALA-7782
> URL: https://issues.apache.org/jira/browse/IMPALA-7782
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.12.0, Impala 3.1.0
> Reporter: Michael Brown
> Priority: Blocker
> Labels: correctness, query_generator
>
> A discrepancy exists between Impala and Postgres when a subquery contains an
> agg and results in an empty set, yet the WHERE clause looking at the subquery
> should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries
> produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}}
> behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the
> problematic queries at the top, but unlike those queries, this full query
> returns the same results in Impala and Postgres (all 8 rows for the
> functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a
> dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names
> alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]