Re: [PR] Add support for parquet field [datafusion]

via GitHub Thu, 19 Feb 2026 08:41:03 -0800


andygrove commented on code in PR #20370:
URL: https://github.com/apache/datafusion/pull/20370#discussion_r2828908468



##########
datafusion/datasource-parquet/src/metadata.rs:
##########
@@ -68,6 +68,55 @@ pub struct DFParquetMetadata<'a> {
     file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
     /// timeunit to coerce INT96 timestamps to
     pub coerce_int96: Option<TimeUnit>,
+    /// Whether to extract and use Parquet field IDs for column resolution
+    pub enable_field_ids: bool,
+}
+
+/// Extracts Parquet field IDs and stores them in Arrow field metadata
+/// under the key "PARQUET:field_id"
+///
+/// # Limitations
+///
+/// TODO: Currently only supports flat schemas (top-level primitive fields).

Review Comment:
   What happens if I enable the feature and try and read a Parquet file with 
complex types?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for parquet field [datafusion]

Reply via email to