[
https://issues.apache.org/jira/browse/IMPALA-12089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770185#comment-17770185
]
ASF subversion and git services commented on IMPALA-12089:
----------------------------------------------------------
Commit 1f82106aff6fc2f0afb5a2c8ed754463955813a4 in impala's branch
refs/heads/master from Peter Rozsa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1f82106af ]
IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.
The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.
Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.
After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.
If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.
This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.
This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.
A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.
Performance of the predicate location is measured on two edge cases:
- 1000 expression, 999 skipped: on avreage 2 ms
- 1000 expression, 1 skipped: on average 25 ms
Tests:
- planner test cases added for disabled mode
- existing planner test cases adjusted
- core tests passed
Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Reviewed-on: http://gerrit.cloudera.org:8080/20133
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Be able to skip pushing down a subset of the predicates
> -------------------------------------------------------
>
> Key: IMPALA-12089
> URL: https://issues.apache.org/jira/browse/IMPALA-12089
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Gabor Kaszab
> Assignee: Peter Rozsa
> Priority: Major
> Labels: impala-iceberg, performance
>
> https://issues.apache.org/jira/browse/IMPALA-11701 introduced logic to skip
> pushing down predicates to Impala scanners if they are already applied by
> Iceberg and won't filter any further rows. This is an "all or nothing"
> approach where we either skip pushing down all the predicates or we push down
> all of them.
> As a more sophisticated approach we should be able to push down a subset of
> the predicates to Impala Scan nodes. For this we should be able to map
> Iceberg predicates (returned from residual()) to Impala predicates. This
> might not be that trivial as Iceberg sometimes doesn't return the exact same
> predicates as it received through planFiles(). E.g. the object ID might be
> different making the mapping more difficult.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]