zhuqi-lucas commented on code in PR #21580:
URL: https://github.com/apache/datafusion/pull/21580#discussion_r3071329377
##########
datafusion/datasource-parquet/src/access_plan.rs:
##########
@@ -377,6 +382,106 @@ impl PreparedAccessPlan {
})
}
+ /// Reorder row groups by their min statistics for the given sort order.
+ ///
+ /// This helps TopK queries find optimal values first. For ASC sort,
+ /// row groups with the smallest min values come first. For DESC sort,
+ /// row groups with the largest min values come first.
+ ///
+ /// Gracefully skips reordering when:
+ /// - There is a row_selection (too complex to remap)
+ /// - 0 or 1 row groups (nothing to reorder)
+ /// - Sort expression is not a simple column reference
+ /// - Statistics are unavailable
+ pub(crate) fn reorder_by_statistics(
Review Comment:
Thanks @Dandandan! Created #21581 to track this. The existing infrastructure
from this PR should be directly reusable — mainly needs the aggregate planner
to populate `sort_order_for_reorder` from grouping keys.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]