[
https://issues.apache.org/jira/browse/IMPALA-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245987#comment-17245987
]
ASF subversion and git services commented on IMPALA-10252:
----------------------------------------------------------
Commit f684ed72c541fa04dc1841a1aab83a7c9847f1a2 in impala's branch
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f684ed7 ]
IMPALA-10252: fix invalid runtime filters for outer joins
The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false if y is NULL.
However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.
The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.
Examples of expressions that may be affected by this change are
COALESCE and ISNULL.
Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
Parquet and Kudu (which are subtly different because of nullability of
slots).
Ran exhaustive tests.
Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Reviewed-on: http://gerrit.cloudera.org:8080/16622
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Query returns less number of rows with run-time filtering on integer column
> in a subquery against functional_parquet schema
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10252
> URL: https://issues.apache.org/jira/browse/IMPALA-10252
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0,
> Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala
> 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
> Reporter: Qifan Chen
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness
> Fix For: Impala 4.0
>
>
> During the work to address IMPALA-6628 (Use unqualified table references in
> .test files run from test_queries.py), it is found that a query against the
> functional_parquet database returns 1 row while the same query returns 12
> rows when run-time filtering is turned off, or against the functional
> database.
>
>
> {code:java}
> Query: --SET RUNTIME_FILTER_MODE=OFF;
> select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:41:15 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=394a61d8f0002336:fd45e07300000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> +----+---------+------+-------+
> {code}
>
>
> {code:java}
> RUNTIME_FILTER_MODE set to OFF
> Query: select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:40:58 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=304c095f478607fc:7d2d03ff00000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
>
> Query against functional database.
> {code:java}
> Query: select id, int_col, year, month
> from functional.alltypessmall s
> where s.int_col = (select count(*) from functional.alltypestiny t where s.id
> = t.id)
> order by id
> Query submitted at: 2020-10-18 12:35:24 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=104bd5d7a6d5fe74:09a6c09000000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]