[
https://issues.apache.org/jira/browse/IMPALA-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157022#comment-17157022
]
ASF subversion and git services commented on IMPALA-9949:
---------------------------------------------------------
Commit 4e2498da6f94a8da78a077bb9e44ff85a66523d0 in impala's branch
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4e2498d ]
IMPALA-9949: fix SELECT list subqueries with HAVING/LIMIT
The patch for IMPALA-8954 failed to account for subqueries
that could produce < 1 row. SelectStmt.returnsSingleRow()
is confusing because it actually returns true if it
returns *at most* one row.
As a fix I split it into returnsExactlyOneRow() and
returnsAtMostOneRow(), then used returnsExactlyOneRow()
to determine if the subquery should instead be rewritten
into a LEFT OUTER JOIN, which produces the correct result.
CROSS JOIN is still preferred because it can be more freely
reordered during planning.
Testing:
* Added planner tests for a range of scenarios where it can
be rewritten as a CROSS JOIN and where it needs to be a LEFT
OUTER JOIN for correctness.
* Added some targeted end-to-end tests where the results were
previously incorrect. Checked the behaviour against Hive and
postgres.
Ran exhaustive tests.
Change-Id: I6034aedac776783bdc8cdb3a2df344e2b3662da6
Reviewed-on: http://gerrit.cloudera.org:8080/16171
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Subqueries in select can result in rows not being returned
> ----------------------------------------------------------
>
> Key: IMPALA-9949
> URL: https://issues.apache.org/jira/browse/IMPALA-9949
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness, regression
>
> IMPALA-8954 added support for uncorrelated subqueries but some do not return
> correct results. Both of those queries should return rows with NULLs where
> the subquery returned 0 rows.
> {noformat}
> [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from
> functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
> Fetched 0 row(s) in 0.16s
> [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from
> functional.alltypes limit 0) from functional.alltypestiny;
> Fetched 0 row(s) in 0.14s
> {noformat}
> The problem is that the CROSS JOIN will return 0 rows if the subquery returns
> 0 rows.
> {noformat}
> [localhost.EXAMPLE.COM:21000] default> explain select (select min(int_col)
> from functional.alltypes having min(int_col) < 0) from
> functional.alltypestiny;
> Query: explain select (select min(int_col) from functional.alltypes having
> min(int_col) < 0) from functional.alltypestiny
> +-------------------------------------------------------------+
> | Explain String |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=40.00KB Threads=5 |
> | Per-Host Resource Estimates: Memory=180MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 03:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
> | | row-size=4B cardinality=8 |
> | | |
> | |--06:EXCHANGE [UNPARTITIONED] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | row-size=0B cardinality=8 |
> | | |
> | 05:AGGREGATE [FINALIZE] |
> | | output: min:merge(int_col) |
> | | having: min(int_col) < 0 |
> | | row-size=4B cardinality=1 |
> | | |
> | 04:EXCHANGE [UNPARTITIONED] |
> | | |
> | 02:AGGREGATE |
> | | output: min(int_col) |
> | | row-size=4B cardinality=1 |
> | | |
> | 01:SCAN HDFS [functional.alltypes] |
> | HDFS partitions=24/24 files=24 size=478.45KB |
> | row-size=4B cardinality=7.30K |
> +-------------------------------------------------------------+
> Fetched 29 row(s) in 0.04s
> {noformat}
> We need to detect cases where the subquery can return 0 rows and instead
> insert a left outer join.
> I did this in a patch and it fixed the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]