alamb commented on code in PR #19064:
URL: https://github.com/apache/datafusion/pull/19064#discussion_r2615507871
##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -710,6 +742,34 @@ impl FileSource for ParquetSource {
)
.with_updated_node(source))
}
+
+ /// When push down to parquet source of a sort operation is possible,
+ /// create a new ParquetSource with reverse_scan enabled.
+ ///
+ /// # Phase 1 Behavior (Current)
+ /// Returns `Inexact` because we're only reversing the scan direction and
reordering
+ /// files/row groups. We still need to verify ordering at a higher level.
+ ///
+ /// # Phase 2 (Future)
+ /// Could return `Exact` when we can guarantee that the scan order matches
the requested order, and
+ /// we can remove any higher-level sort operations.
+ ///
+ /// TODO support more policies in addition to reversing the scan.
+ fn try_pushdown_sort(
Review Comment:
> IMO the breakdown should be:
That ceratinly sounds interesting -- what would be an example where knowing
the actual sort order in the file is useful 🤔
Well now that I think about it, maybe the source needs to know about the
actual order to do the row group / limit trick @xudong963 is working on
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]