alamb commented on code in PR #21190:
URL: https://github.com/apache/datafusion/pull/21190#discussion_r3011698342
##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -125,15 +133,338 @@ pub(super) struct ParquetOpener {
pub reverse_row_groups: bool,
}
+/// States for [`ParquetOpenFuture`]
+///
+/// These states correspond to the steps required to read and apply various
+/// filter operations.
+///
+/// States whose names beginning with `Load` represent waiting on IO to resolve
+///
+/// ```text
+/// Start
+/// |
+/// v
+/// [LoadEncryption]?
+/// |
+/// v
+/// PruneFile
+/// |
+/// v
+/// LoadMetadata
+/// |
+/// v
+/// PrepareFilters
Review Comment:
Preparing the filter needs the file schema in order to adapt the expression
to the actual file schema (and thus this work has tp be done after metadata is
fetched)
I am not quite sure what you are suggesting, however
Do you you mean we could load the page index *before* preparing the filters
(aka delay preparing the pruing predicates until we had the page index)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]