Dandandan opened a new pull request, #21731:
URL: https://github.com/apache/datafusion/pull/21731

   ## Purpose
   
   Test branch combining:
   - **#21351** (base) — Dynamic work scheduling in FileStream (inter-file work 
stealing across sibling partitions)
   - **#21580** (merged on top) — Reorder row groups by statistics during sort 
pushdown (intra-file RG reorder for TopK)
   
   The goal is to measure whether the two optimizations compound on TopK-style 
queries (e.g. `ORDER BY col LIMIT N` on multi-file / multi-RG parquet), since 
they operate at different granularities:
   - #21351 balances **files** across partitions at runtime via a shared work 
queue.
   - #21580 reorders **row groups within a file** so TopK sees the best values 
first, tightening the dynamic filter threshold earlier — which then propagates 
across partitions via the filter and can amplify #21351's work-stealing gains.
   
   ## Conflict resolution
   
   Only one real conflict: `datafusion/datasource/src/source.rs`.
   
   #21580 had been rebased on top of upstream #21576 (which removes the 
explicit `as_any` method on `DataSource` in favor of an `Any` supertrait 
bound). #21351's base predates #21576, so it still uses the explicit `as_any` 
method.
   
   Resolved in favor of #21351's style:
   - Kept `DataSource: Send + Sync + Debug` with an `as_any(&self) -> &dyn Any` 
method.
   - Restored `as_any` impls on `FileScanConfig` and `MemorySourceConfig`.
   - Added `use std::any::Any` imports in `file_scan_config/mod.rs` and 
`memory.rs`.
   - Rewrote `dyn DataSource::{is, downcast_ref}` helpers to call 
`self.as_any()` (with a `T: 'static` bound).
   
   ## Status
   
   **Draft — not for merge.** This is an integration branch for benchmarking. 
The two PRs should be reviewed and merged independently upstream.
   
   - [x] `cargo check --workspace` passes
   - [ ] `cargo test` not yet run
   - [ ] Benchmarks not yet run
   
   ## Follow-ups
   
   - Run ClickBench (especially Q23–Q26) and the sort-pushdown benchmarks 
(#21582) on this combined branch vs. each PR individually to quantify the 
compounding effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to