Joe McDonnell created IMPALA-14510:
--------------------------------------

             Summary: Runtime filters with low effectiveness should be disabled 
more aggressively
                 Key: IMPALA-14510
                 URL: https://issues.apache.org/jira/browse/IMPALA-14510
             Project: IMPALA
          Issue Type: Task
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


There is an existing check for a runtime filter's effectiveness in 
HdfsScanner::CheckFiltersEffectiveness() which can disable runtime filters that 
are always true or don't meet the min_filter_reject_ratio (default value 0.1). 
This check runs every BATCHES_PER_FILTER_SELECTIVITY_CHECK (16) batches:
{noformat}
    // Always add batch to the queue because it may contain data referenced by 
previously
    // appended batches.
    scan_node->AddMaterializedRowBatch(move(batch));
    RETURN_IF_ERROR(status);
    ++row_batches_produced_;
    if ((row_batches_produced_ & (BATCHES_PER_FILTER_SELECTIVITY_CHECK - 1)) == 
0) {
      CheckFiltersEffectiveness();
    }{noformat}
 If there are multiple runtime filters and one of them is very selective, it 
may not be returning row batches very often. This means that an ineffective 
filter can be evaluated on many rows before being disabled (or may never be 
disabled). For example from TPC-DS Q13:
{noformat}
 - Rows processed: 47.46M (47458959)
 - Rows rejected: 0 (0)
 - Rows total: 47.46M (47458959)
{noformat}
We should try to disable ineffective filters more aggressively in this 
circumstance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to