Zoltán Borók-Nagy created IMPALA-15111:
------------------------------------------
Summary: Generate min/max runtime filters for inequality join
predicates on hash joins
Key: IMPALA-15111
URL: https://issues.apache.org/jira/browse/IMPALA-15111
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Zoltán Borók-Nagy
Currently, Impala only generates min/max runtime filters from equi-join
predicates on hash joins (a.col = b.col). For inequality join predicates (e.g.,
a.col <= b.col), min/max filters are only supported when the right side is a
non-correlated scalar subquery.
This means range-overlap join patterns like:
{noformat}
SELECT ...
FROM fact_table a
INNER JOIN dim_table b ON a.key = b.key
AND a.start_time <= b.end_time
AND a.end_time >= b.start_time
{noformat}
do not benefit from min/max runtime filters on start_time/end_time, even though
the build side's min/max bounds could effectively prune row groups and pages on
the probe side.
*Proposed Enhancement*:
Extend RuntimeFilterGenerator to recognize inequality join predicates of the
form probe.col {<=, <, >=, >} build.col on hash joins and generate min/max
runtime filters for the probe-side column. Specifically:
- For probe.col <= build.col: the build side computes max(build.col) and the
probe side filters rows/row groups where probe.col > max(build.col).
- For probe.col >= build.col: the build side computes min(build.col) and the
probe side filters rows/row groups where probe.col < min(build.col).
This would enable Parquet row group/page-level pruning via overlap checks for
range-join patterns, which are common in time-series and event-correlation
workloads.
*Implementation Notes*:
- In RuntimeFilterGenerator.java, the hash join path currently only calls
isSqlEquivalencePredicate(). Add handling for BinaryPredicate with <=, <, >=, >
operators where one side is bound to the build and the other to the probe.
- Generate TRuntimeFilterType.MIN_MAX filters only (bloom filters don't apply
to inequality predicates).
- The build-side aggregation (computing min/max of the build column) is
already how min/max filters work for equi-joins — the same mechanism can be
reused.
- On the scan side, the existing EvaluateOverlapForRowGroup() and page-level
overlap logic should work as-is once the filter is generated and assigned to
the scan node.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)