zhuqi-lucas commented on issue #22882:
URL: https://github.com/apache/datafusion/issues/22882#issuecomment-4749130221
### Sort Pushdown — skip sorts and skip IO for `ORDER BY` / TopK on Parquet
Sharing what I've been working on (and plan to keep working on) — an
ongoing initiative spanning **DataFusion v52 → v55+**. I just opened a
tracking epic that collects all the merged and in-flight PRs:
- 📋 **Epic: #23036**
**Where we are today**
- ✅ **Framework** (v52) — `PushdownSort` rule + `Exact / Inexact /
Unsupported` classifier + reverse iteration
- ✅ **Statistics-based file reorder** (v53) — fixes wrong-order
non-overlapping files, upgrades `Unsupported → Exact`, eliminates
`SortExec`, preserves `LIMIT`. **27×–49× on benchmark `ORDER BY
LIMIT` queries**
- ✅ **Multi-partition + `BufferExec`** (v54) — explicit bounded buffer
per partition so SPM k-way merge doesn't stall on IO; cross-partition
morsel queue for global file/RG reorder
- 🚧 **Runtime RG-level early stop via TopK dynamic filter** (v55, in
review) — closes the "can't shut off the tap mid-file" gap. **5 of 11
topk_tpch queries gain 3–4×, 0 regressions, total runtime -44%.**
Main PR: #22450
**What's next (Q3-Q4)**
- Land #22450 and the helpers (#22385, #21712, #21580, #21828)
- Cross-repo with arrow-rs: `peek_next_row_group()` (arrow-rs#10158)
and page-level reverse (arrow-rs#9937)
- Once arrow-rs ships page reverse, follow-up DataFusion PR to upgrade
`DESC` against ASC-sorted parquet to **`Exact`** — drops the
`SortExec`, emits `LIMIT N` as source-side static fetch
Always happy to get review cycles or co-design help on the in-flight
PRs — see the epic for the full list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]