geoffreyclaude opened a new pull request, #21958: URL: https://github.com/apache/datafusion/pull/21958
## Which issue does this PR close? - None. ## Rationale for this change The `output_rows_skew` sqllogictest asserts exact per-partition scan row distributions. That test was added before dynamic `FileStream` work scheduling, where reorderable sibling scan streams can now share one queue of unopened files. With shared work scheduling enabled, the `DataSourceExec` explain output can still show four file groups, but the runtime `output_rows` metrics are attributed to whichever output partition actually scans each file. That makes the expected `[4, 0, 1, 0]` distribution nondeterministic: CI has observed the expected `output_rows_skew=84.31%` becoming `output_rows_skew=100%` when one stream scans all non-empty files. This happened on PR #21927 and also reproduced independently on `main` in `verify benchmark results (amd64)`, so it is a pre-existing test flake rather than a regression from that PR. ## What changes are included in this PR? This PR adds `WITH ORDER (x)` to the temporary external Parquet tables used by the `output_rows_skew` sqllogictest. An output ordering makes the scan order-preserving, which disables shared file work stealing for these tables and keeps the test file groups local to their output partitions. This keeps the exact skew assertions meaningful without relaxing the expected metric value. ## Are these changes tested? - `cargo test --features backtrace,parquet_encryption,substrait --profile ci --package datafusion-sqllogictest --test sqllogictests -- explain_analyze.slt --test-threads=1` - `cargo fmt --all` - `cargo clippy --workspace --all-targets --all-features --exclude datafusion-benchmarks -- -D warnings` I also attempted the repository instruction `cargo clippy --all-targets --all-features -- -D warnings`, but it currently fails in `datafusion-benchmarks` before reaching this change because `--all-features` enables the mutually exclusive `snmalloc` and `mimalloc` allocator features. ## Are there any user-facing changes? No. This is a sqllogictest-only determinism fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
