Re: [PR] Refactor parquet datasource into an explicit state machine [datafusion]

via GitHub Mon, 30 Mar 2026 12:35:03 -0700


alamb commented on code in PR #21190:
URL: https://github.com/apache/datafusion/pull/21190#discussion_r3011698342



##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -125,15 +133,338 @@ pub(super) struct ParquetOpener {
     pub reverse_row_groups: bool,
 }
 
+/// States for [`ParquetOpenFuture`]
+///
+/// These states correspond to the steps required to read and apply various
+/// filter operations.
+///
+/// States whose names beginning with `Load` represent waiting on IO to resolve
+///
+/// ```text
+///      Start
+///        |
+///        v
+/// [LoadEncryption]?
+///        |
+///        v
+///    PruneFile
+///        |
+///        v
+///   LoadMetadata
+///        |
+///        v
+///  PrepareFilters

Review Comment:
   Preparing the filter needs the file schema in order to adapt the expression 
to the actual file schema (and thus this work has tp be done after metadata is 
fetched)
   
   I am not quite sure what you are suggesting, however
   
   Do you  you mean we could load the page index *before* preparing the filters 
(aka delay preparing the pruing predicates until we had the page index)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Refactor parquet datasource into an explicit state machine [datafusion]

Reply via email to