diegoQuinas opened a new pull request, #22643:
URL: https://github.com/apache/datafusion/pull/22643

   ## Which issue does this PR close?
   
   - Closes #22621.
   
   ## Rationale for this change
   
   `push_down_filter_regression.slt` (added in #22150) asserts the exact
   `DynamicFilter` content rendered by `EXPLAIN ANALYZE` on the `agg_dyn_*`
   fixtures. That content is **not** deterministic: the filter threshold 
tightens
   as each `AggregateExec(mode=Partial)` publishes its running `min`/`max`, and 
the
   `EXPLAIN ANALYZE` snapshot can be taken while the filter is still converging.
   
   For `agg_dyn_single`, `file_0` holds the global `min` (1) and `file_1` a 
larger
   partial `min` (3). If the snapshot lands after `file_1` publishes `3` but 
before
   `file_0` publishes `1`, the filter reads `a < 3` instead of the final `a < 
1` —
   exactly the intermittent CI failure reported in #22621. The fixture's comment
   incorrectly claimed the filter *content* was deterministic and only the 
pruning
   *counts* raced.
   
   ## What changes are included in this PR?
   
   Make the filter content independent of publish order by giving **every file 
the
   same per-file min/max**, so any snapshot equals the fully converged filter:
   
   - `agg_dyn_single` — both files `(1), (8)` → each file `min=1, max=8`.
   - `agg_dyn_two_col` — each file `min(a)=1, max(b)=9`.
   - `agg_dyn_mixed` — each file `min(a)=1, max(a)=8, max(b)=12`.
   
   `agg_dyn_two_col` and `agg_dyn_mixed` were not in the reported failure but 
shared
   the same latent race (differing per-file extremes), so they are fixed too.
   `agg_dyn_nulls` is left untouched — its filter is always `true` and never 
races.
   
   The expected plan text is **unchanged**; only the input data and the 
misleading
   comments are modified. The alternative of forcing a single partition was
   rejected: dynamic aggregate filters are only emitted in `Partial+Final` mode
   (`target_partitions >= 2`), so a single partition would emit no filter at 
all.
   
   ## Are these changes tested?
   
   Yes — the modified `push_down_filter_regression.slt` itself is the test. It
   passes, and because the asserted filter content no longer depends on 
partition
   scheduling, it is stable across runs (verified by running it repeatedly 
locally).
   
   ## Are there any user-facing changes?
   
   No. Test-only change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to