Re: [PR] Add support for parquet field [datafusion]

via GitHub Thu, 19 Feb 2026 08:40:08 -0800


andygrove commented on code in PR #20370:
URL: https://github.com/apache/datafusion/pull/20370#discussion_r2828904006



##########
datafusion/datasource-parquet/src/metadata.rs:
##########
@@ -68,6 +68,55 @@ pub struct DFParquetMetadata<'a> {
     file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
     /// timeunit to coerce INT96 timestamps to
     pub coerce_int96: Option<TimeUnit>,
+    /// Whether to extract and use Parquet field IDs for column resolution
+    pub enable_field_ids: bool,
+}
+
+/// Extracts Parquet field IDs and stores them in Arrow field metadata
+/// under the key "PARQUET:field_id"
+///
+/// # Limitations
+///
+/// TODO: Currently only supports flat schemas (top-level primitive fields).
+/// Nested field IDs within structs, lists, and maps are not yet supported.
+/// This requires recursive traversal of the Parquet schema tree to extract
+/// field IDs at all nesting levels. See PARQUET_FIELD_ID_IMPLEMENTATION.md

Review Comment:
   Could you remove the reference to this markdown file since it isn't part of 
this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for parquet field [datafusion]

Reply via email to