diegoQuinas opened a new pull request, #22643: URL: https://github.com/apache/datafusion/pull/22643
## Which issue does this PR close? - Closes #22621. ## Rationale for this change `push_down_filter_regression.slt` (added in #22150) asserts the exact `DynamicFilter` content rendered by `EXPLAIN ANALYZE` on the `agg_dyn_*` fixtures. That content is **not** deterministic: the filter threshold tightens as each `AggregateExec(mode=Partial)` publishes its running `min`/`max`, and the `EXPLAIN ANALYZE` snapshot can be taken while the filter is still converging. For `agg_dyn_single`, `file_0` holds the global `min` (1) and `file_1` a larger partial `min` (3). If the snapshot lands after `file_1` publishes `3` but before `file_0` publishes `1`, the filter reads `a < 3` instead of the final `a < 1` — exactly the intermittent CI failure reported in #22621. The fixture's comment incorrectly claimed the filter *content* was deterministic and only the pruning *counts* raced. ## What changes are included in this PR? Make the filter content independent of publish order by giving **every file the same per-file min/max**, so any snapshot equals the fully converged filter: - `agg_dyn_single` — both files `(1), (8)` → each file `min=1, max=8`. - `agg_dyn_two_col` — each file `min(a)=1, max(b)=9`. - `agg_dyn_mixed` — each file `min(a)=1, max(a)=8, max(b)=12`. `agg_dyn_two_col` and `agg_dyn_mixed` were not in the reported failure but shared the same latent race (differing per-file extremes), so they are fixed too. `agg_dyn_nulls` is left untouched — its filter is always `true` and never races. The expected plan text is **unchanged**; only the input data and the misleading comments are modified. The alternative of forcing a single partition was rejected: dynamic aggregate filters are only emitted in `Partial+Final` mode (`target_partitions >= 2`), so a single partition would emit no filter at all. ## Are these changes tested? Yes — the modified `push_down_filter_regression.slt` itself is the test. It passes, and because the asserted filter content no longer depends on partition scheduling, it is stable across runs (verified by running it repeatedly locally). ## Are there any user-facing changes? No. Test-only change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
