[
https://issues.apache.org/jira/browse/IMPALA-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218555#comment-17218555
]
Tim Armstrong commented on IMPALA-10252:
----------------------------------------
I was surprised that we were generating a runtime filter for a LEFT JOIN here -
we generally wouldn't, but it looks like we can generate it for the non-join
conjuncts:
{code}
private void generateFilters(PlannerContext ctx, PlanNode root) {
if (root instanceof HashJoinNode) {
HashJoinNode joinNode = (HashJoinNode) root;
List<Expr> joinConjuncts = new ArrayList<>();
if (!joinNode.getJoinOp().isLeftOuterJoin()
&& !joinNode.getJoinOp().isFullOuterJoin()
&& !joinNode.getJoinOp().isAntiJoin()) {
// It's not correct to push runtime filters to the left side of a left
outer,
// full outer or anti join if the filter corresponds to an equi-join
predicate
// from the ON clause.
joinConjuncts.addAll(joinNode.getEqJoinConjuncts());
}
joinConjuncts.addAll(joinNode.getConjuncts());
{code}
I think the scenario here is:
* Left outer or full outer join
* Non-join equality conjuncts with a non-null-preserving function on a column
from the inner, e.g. zeroifnull() which converts a NULL to a non-NULL value.
The problem is that the outer join can return NULLs that were not in the inner
because of non-matching rows in the join, but those are not included in the
runtime filter.
I think we could fix this fairly directly by modifying the join builder to
insert a row with all NULL slots into the runtime filter in the case of LEFT
and FULL OUTER joins.
> Query returns less number of rows with run-time filtering on integer column
> in a subquery against functional_parquet schema
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10252
> URL: https://issues.apache.org/jira/browse/IMPALA-10252
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Qifan Chen
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness
>
> During the work to address IMPALA-6628 (Use unqualified table references in
> .test files run from test_queries.py), it is found that a query against the
> functional_parquet database returns 1 row while the same query returns 12
> rows when run-time filtering is turned off, or against the functional
> database.
>
>
> {code:java}
> Query: --SET RUNTIME_FILTER_MODE=OFF;
> select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:41:15 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=394a61d8f0002336:fd45e07300000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> +----+---------+------+-------+
> {code}
>
>
> {code:java}
> RUNTIME_FILTER_MODE set to OFF
> Query: select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:40:58 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=304c095f478607fc:7d2d03ff00000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
>
> Query against functional database.
> {code:java}
> Query: select id, int_col, year, month
> from functional.alltypessmall s
> where s.int_col = (select count(*) from functional.alltypestiny t where s.id
> = t.id)
> order by id
> Query submitted at: 2020-10-18 12:35:24 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=104bd5d7a6d5fe74:09a6c09000000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]