adriangb commented on code in PR #19064:
URL: https://github.com/apache/datafusion/pull/19064#discussion_r2615516277
##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -710,6 +742,34 @@ impl FileSource for ParquetSource {
)
.with_updated_node(source))
}
+
+ /// When push down to parquet source of a sort operation is possible,
+ /// create a new ParquetSource with reverse_scan enabled.
+ ///
+ /// # Phase 1 Behavior (Current)
+ /// Returns `Inexact` because we're only reversing the scan direction and
reordering
+ /// files/row groups. We still need to verify ordering at a higher level.
+ ///
+ /// # Phase 2 (Future)
+ /// Could return `Exact` when we can guarantee that the scan order matches
the requested order, and
+ /// we can remove any higher-level sort operations.
+ ///
+ /// TODO support more policies in addition to reversing the scan.
+ fn try_pushdown_sort(
Review Comment:
Yeah something like that. We can always add helpers to handle the "special"
cases of "I can only reverse the existing sort order" which is the approach
we've taken with projections and has worked nicely. Essentially keep the APIs a
bit more general (they're not that complex anyway, nothing like the filter
pushdown stuff) and then make them easy to handle by providing helpers (e.g. we
can export `can_satisfy_sort_order_by_reversing_scan_order(plan_order: ...,
file_order: ...)`.
Fundamentally only each `FileSource` will know what tricks it can do to
better satisfy the plans order. To make that decision it needs (1) the order
the plan wants and (2) the order it's files are in.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]