[ 
https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4373:
-------------------------------------

    Assignee:     (was: Alexander Behm)

> Wrong results with correlated WHERE-clause subquery inside a NULL-checking 
> conditional function.
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-4373
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4373
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0
>            Reporter: Alexander Behm
>            Priority: Critical
>              Labels: correctness
>
> Impala may generate an incorrect plan for queries that have a correlated 
> scalar subquery as a parameter to a NULL-checking conditional function like 
> ISNULL().
> Example query and incorrect plan:
> {code}
> select t1.int_col
> from functional.alltypessmall as t1
> where t1.int_col >= isnull
> (
>    (
>     SELECT 
>      MAX(t2.bigint_col)
>     FROM 
>      functional.alltypestiny AS t2 
>     WHERE 
>      t1.id = t2.id + 10000
>     ),
>    0  
> )
> Fetched 0 row(s) in 1.09s
> Single-node plan:
> +-----------------------------------------------------------------------+
> | Explain String                                                        |
> +-----------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=0B VCores=0                   |
> |                                                                       |
> | PLAN-ROOT SINK                                                        |
> | |                                                                     |
> | 03:HASH JOIN [LEFT SEMI JOIN]                                         |
> | |  hash predicates: t1.id = t2.id + 10000                             |
> | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
> | |  runtime filters: RF000 <- t2.id + 10000                            |
> | |                                                                     |
> | |--02:AGGREGATE [FINALIZE]                                            |
> | |  |  output: max(t2.bigint_col)                                      |
> | |  |  group by: t2.id                                                 |
> | |  |                                                                  |
> | |  01:SCAN HDFS [functional.alltypestiny t2]                          |
> | |     partitions=4/4 files=4 size=460B                                |
> | |                                                                     |
> | 00:SCAN HDFS [functional.alltypessmall t1]                            |
> |    partitions=4/4 files=4 size=6.32KB                                 |
> |    runtime filters: RF000 -> t1.id                                    |
> +-----------------------------------------------------------------------+
> {code}
> The query returns an empty result set but instead should return all rows from 
> t1 because all invocations of the subquery return NULL, and all rows from t1 
> satisfy "t1.int_col >= 0".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to