Re: [PR] Push down projection expressions into ParquetOpener [datafusion]

via GitHub Tue, 09 Dec 2025 12:06:20 -0800


adriangb commented on code in PR #19111:
URL: https://github.com/apache/datafusion/pull/19111#discussion_r2604144000



##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -176,42 +176,32 @@ pub(crate) struct FilterCandidate {
     required_bytes: usize,
     /// Can this filter use an index (e.g. a page index) to prune rows?
     can_use_index: bool,
-    /// The projection to read from the file schema to get the columns
-    /// required to evaluate the filter expression.
+    /// Column indices into the parquet file schema required to evaluate this 
filter.
     projection: Vec<usize>,
-    /// The projected table schema that this filter references
+    /// The Arrow schema containing only the columns required by this filter,
+    /// projected from the file's Arrow schema.
     filter_schema: SchemaRef,
 }
 
 /// Helper to build a `FilterCandidate`.
 ///
-/// This will do several things
+/// This will do several things:
 /// 1. Determine the columns required to evaluate the expression
 /// 2. Calculate data required to estimate the cost of evaluating the filter
 ///
-/// Note that this does *not* handle any adaptation of the data schema to the 
expression schema,
-/// it is assumed that the expression has already been adapted to the file 
schema before being passed in here,
-/// generally using 
[`PhysicalExprAdapter`](datafusion_physical_expr_adapter::PhysicalExprAdapter).
+/// Note: This does *not* handle any adaptation of the expression to the file 
schema.
+/// The expression must already be adapted before being passed in here, 
generally using
+/// 
[`PhysicalExprAdapter`](datafusion_physical_expr_adapter::PhysicalExprAdapter).
 struct FilterCandidateBuilder {
     expr: Arc<dyn PhysicalExpr>,
-    /// The schema of this parquet file.
+    /// The Arrow schema of this parquet file (the result of converting the
+    /// parquet schema to Arrow, potentially with type coercions applied).
     file_schema: SchemaRef,
-    /// The schema of the table (merged schema) -- columns may be in different
-    /// order than in the file and have columns that are not in the file schema
-    table_schema: SchemaRef,

Review Comment:
   Because these two schemas are always the same now as called from 
`ParquetOpener` there was no point in keeping two around.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push down projection expressions into ParquetOpener [datafusion]

Reply via email to