Joe McDonnell created IMPALA-14510:
--------------------------------------
Summary: Runtime filters with low effectiveness should be disabled
more aggressively
Key: IMPALA-14510
URL: https://issues.apache.org/jira/browse/IMPALA-14510
Project: IMPALA
Issue Type: Task
Components: Backend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
There is an existing check for a runtime filter's effectiveness in
HdfsScanner::CheckFiltersEffectiveness() which can disable runtime filters that
are always true or don't meet the min_filter_reject_ratio (default value 0.1).
This check runs every BATCHES_PER_FILTER_SELECTIVITY_CHECK (16) batches:
{noformat}
// Always add batch to the queue because it may contain data referenced by
previously
// appended batches.
scan_node->AddMaterializedRowBatch(move(batch));
RETURN_IF_ERROR(status);
++row_batches_produced_;
if ((row_batches_produced_ & (BATCHES_PER_FILTER_SELECTIVITY_CHECK - 1)) ==
0) {
CheckFiltersEffectiveness();
}{noformat}
If there are multiple runtime filters and one of them is very selective, it
may not be returning row batches very often. This means that an ineffective
filter can be evaluated on many rows before being disabled (or may never be
disabled). For example from TPC-DS Q13:
{noformat}
- Rows processed: 47.46M (47458959)
- Rows rejected: 0 (0)
- Rows total: 47.46M (47458959)
{noformat}
We should try to disable ineffective filters more aggressively in this
circumstance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]