[
https://issues.apache.org/jira/browse/IMPALA-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-9949.
-----------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Subqueries in select can result in rows not being returned
> ----------------------------------------------------------
>
> Key: IMPALA-9949
> URL: https://issues.apache.org/jira/browse/IMPALA-9949
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness, regression
> Fix For: Impala 4.0
>
>
> IMPALA-8954 added support for uncorrelated subqueries but some do not return
> correct results. Both of those queries should return rows with NULLs where
> the subquery returned 0 rows.
> {noformat}
> [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from
> functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
> Fetched 0 row(s) in 0.16s
> [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from
> functional.alltypes limit 0) from functional.alltypestiny;
> Fetched 0 row(s) in 0.14s
> {noformat}
> The problem is that the CROSS JOIN will return 0 rows if the subquery returns
> 0 rows.
> {noformat}
> [localhost.EXAMPLE.COM:21000] default> explain select (select min(int_col)
> from functional.alltypes having min(int_col) < 0) from
> functional.alltypestiny;
> Query: explain select (select min(int_col) from functional.alltypes having
> min(int_col) < 0) from functional.alltypestiny
> +-------------------------------------------------------------+
> | Explain String |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=40.00KB Threads=5 |
> | Per-Host Resource Estimates: Memory=180MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 03:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
> | | row-size=4B cardinality=8 |
> | | |
> | |--06:EXCHANGE [UNPARTITIONED] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | row-size=0B cardinality=8 |
> | | |
> | 05:AGGREGATE [FINALIZE] |
> | | output: min:merge(int_col) |
> | | having: min(int_col) < 0 |
> | | row-size=4B cardinality=1 |
> | | |
> | 04:EXCHANGE [UNPARTITIONED] |
> | | |
> | 02:AGGREGATE |
> | | output: min(int_col) |
> | | row-size=4B cardinality=1 |
> | | |
> | 01:SCAN HDFS [functional.alltypes] |
> | HDFS partitions=24/24 files=24 size=478.45KB |
> | row-size=4B cardinality=7.30K |
> +-------------------------------------------------------------+
> Fetched 29 row(s) in 0.04s
> {noformat}
> We need to detect cases where the subquery can return 0 rows and instead
> insert a left outer join.
> I did this in a patch and it fixed the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]