Zoltán Borók-Nagy created IMPALA-15111:
------------------------------------------

             Summary: Generate min/max runtime filters for inequality join 
predicates on hash joins
                 Key: IMPALA-15111
                 URL: https://issues.apache.org/jira/browse/IMPALA-15111
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Zoltán Borók-Nagy


Currently, Impala only generates min/max runtime filters from equi-join 
predicates on hash joins (a.col = b.col). For inequality join predicates (e.g., 
a.col <= b.col), min/max filters are only supported when the right side is a 
non-correlated scalar subquery.

This means range-overlap join patterns like:

{noformat}
  SELECT ...
  FROM fact_table a
  INNER JOIN dim_table b ON a.key = b.key
    AND a.start_time <= b.end_time
    AND a.end_time >= b.start_time
{noformat}
do not benefit from min/max runtime filters on start_time/end_time, even though 
the build side's min/max bounds could effectively prune row groups and pages on 
the probe side.

*Proposed Enhancement*:

  Extend RuntimeFilterGenerator to recognize inequality join predicates of the 
form probe.col {<=, <, >=, >} build.col on hash joins and generate min/max 
runtime filters for the probe-side column. Specifically:

  - For probe.col <= build.col: the build side computes max(build.col) and the 
probe side filters rows/row groups where probe.col > max(build.col).
  - For probe.col >= build.col: the build side computes min(build.col) and the 
probe side filters rows/row groups where probe.col < min(build.col).

This would enable Parquet row group/page-level pruning via overlap checks for 
range-join patterns, which are common in time-series and event-correlation 
workloads.

*Implementation Notes*:

  - In RuntimeFilterGenerator.java, the hash join path currently only calls 
isSqlEquivalencePredicate(). Add handling for BinaryPredicate with <=, <, >=, > 
operators where one side is bound to the build and the other to the probe.
  - Generate TRuntimeFilterType.MIN_MAX filters only (bloom filters don't apply 
to inequality predicates).
  - The build-side aggregation (computing min/max of the build column) is 
already how min/max filters work for equi-joins — the same mechanism can be 
reused.
  - On the scan side, the existing EvaluateOverlapForRowGroup() and page-level 
overlap logic should work as-is once the filter is generated and assigned to 
the scan node.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to