[ 
https://issues.apache.org/jira/browse/IMPALA-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm resolved IMPALA-6286.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

commit 09f6b7aa733066e7bc7247b74d24a1d8b5549cf3
Author: Alex Behm <[email protected]>
Date:   Wed Dec 6 15:11:38 2017 -0800

    IMPALA-6286: Remove invalid runtime filter targets.
    
    If the target expression of a runtime filter evaluates to a
    non-NULL value for outer-join non-matches, then assigning
    the filter below the nullable side of an outer join may
    lead to incorrect query results.
    See IMPALA-6286 for an example and explanation.
    
    This patch adds a conservative check that prevents the
    creation of runtime filters that could potentially
    have such incorrect targets. Some safe opportunities
    are deliberately missed to keep the code simple.
    See RuntimeFilterGenerator#getTargetSlots().
    
    Testing:
    - added planner tests which passed locally
    
    Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298
    Reviewed-on: http://gerrit.cloudera.org:8080/8783
    Reviewed-by: Alex Behm <[email protected]>
    Tested-by: Impala Public Jenkins


> Wrong results with outer join and RUNTIME_FILTER_MODE=GLOBAL
> ------------------------------------------------------------
>
>                 Key: IMPALA-6286
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6286
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, 
> Impala 2.9.0, Impala 2.10.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Behm
>            Priority: Blocker
>              Labels: correctness, planner, runtime-filters
>             Fix For: Impala 2.11.0
>
>
> Queries with the following characteristics may produce wrong results due to 
> an incorrectly assigned runtime filter:
> * The query option RUNTIME_FILTER_MODE is set to GLOBAL
> * The query has an outer join
> * A scan on the nullable side of that outer join has a runtime filter with a 
> NULL-checking expression such as COALESCE/IFNULL/CASE
> * The latter point imples that there is another join above the outer join 
> with a NULL-checking expression in it's join condition
> Reproduction:
> {code}
> select count(*) from functional.alltypestiny t1
> left outer join functional.alltypestiny t2
>   on t1.id = t2.id
> where coalesce(t2.id + 10, 100) in (select 100)
> +----------+
> | count(*) |
> +----------+
> | 8        |
> +----------+
> {code}
> We expect a count of 0. A count of 8 is incorrect. 
> Query plan:
> {code}
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=3.88MB              |
> | Per-Host Resource Estimates: Memory=87.88MB                   |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 10:AGGREGATE [FINALIZE]                                       |
> | |  output: count:merge(*)                                     |
> | |                                                             |
> | 09:EXCHANGE [UNPARTITIONED]                                   |
> | |                                                             |
> | 05:AGGREGATE                                                  |
> | |  output: count(*)                                           |
> | |                                                             |
> | 04:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                      |
> | |  hash predicates: coalesce(t2.id + 10, 100) = `$a$1`.`$c$1` |
> | |  runtime filters: RF000 <- `$a$1`.`$c$1`                    |
> | |                                                             |
> | |--08:EXCHANGE [BROADCAST]                                    |
> | |  |                                                          |
> | |  02:UNION                                                   |
> | |     constant-operands=1                                     |
> | |                                                             |
> | 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                   |
> | |  hash predicates: t1.id = t2.id                             |
> | |                                                             |
> | |--07:EXCHANGE [HASH(t2.id)]                                  |
> | |  |                                                          |
> | |  01:SCAN HDFS [functional.alltypestiny t2]                  |
> | |     partitions=4/4 files=4 size=460B                        |
> | |     runtime filters: RF000 -> coalesce(t2.id + 10, 100)  <--- This 
> runtime filter is not correct   |
> | |                                                             |
> | 06:EXCHANGE [HASH(t1.id)]                                     |
> | |                                                             |
> | 00:SCAN HDFS [functional.alltypestiny t1]                     |
> |    partitions=4/4 files=4 size=460B                           |
> +---------------------------------------------------------------+
> {code}
> Explanation:
> * RF000 filters out all rows in scan 01
> * In join 03 there are no join matches since the right-hand is empty. All 
> rows from the right-hand side are nulled.
> * The join condition in join 04 now satisfies all input rows because every 
> "t2.id" is NULL, so after the COALESCE() the join condition becomes 100 = 100
> *Workaround*
> * Set RUNTIME_FILTER_MODE to LOCAL or OFF



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to