adriangb commented on code in PR #19064:
URL: https://github.com/apache/datafusion/pull/19064#discussion_r2615493004
##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -710,6 +742,34 @@ impl FileSource for ParquetSource {
)
.with_updated_node(source))
}
+
+ /// When push down to parquet source of a sort operation is possible,
+ /// create a new ParquetSource with reverse_scan enabled.
+ ///
+ /// # Phase 1 Behavior (Current)
+ /// Returns `Inexact` because we're only reversing the scan direction and
reordering
+ /// files/row groups. We still need to verify ordering at a higher level.
+ ///
+ /// # Phase 2 (Future)
+ /// Could return `Exact` when we can guarantee that the scan order matches
the requested order, and
+ /// we can remove any higher-level sort operations.
+ ///
+ /// TODO support more policies in addition to reversing the scan.
+ fn try_pushdown_sort(
Review Comment:
> since the ParquetSource doesn't actually know how it is sorted (the sort
ordering is on the FileScanConfig,
[here](https://github.com/apache/datafusion/blob/eb39e540f8373ea921fb4f9f1cd57eaa59bedd00/datafusion/datasource/src/file_scan_config.rs#L159-L160))
This seems like something we should change!
IMO the breakdown should be:
- FileScanConfig knows how the file groups are sorted
- FileSource knows how the *files* themselves are sorted
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]