[PR] fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types [iceberg-rust]

via GitHub Tue, 31 Mar 2026 15:53:40 -0700


mbutrovich opened a new pull request, #2307:
URL: https://github.com/apache/iceberg-rust/pull/2307


   ## Which issue does this PR close?
   
   - Closes #2306.
   
   ## What changes are included in this PR?
   
   `build_fallback_field_id_map` iterated over Parquet leaf columns instead of 
top-level fields when building the field ID to column index mapping for 
migrated files (no embedded field IDs). When nested types (struct, list, map) 
precede a primitive column, they expand into multiple leaves, causing the 
mapping to diverge from `add_fallback_field_ids_to_arrow_schema` which 
correctly assigns ordinal IDs to top-level Arrow fields. This made predicates 
on columns after nested types resolve to a leaf inside the group, crashing with 
"Leaf column `id` in predicates isn't a root column in Parquet schema".
   
   The fix uses `SchemaDescriptor::get_column_root_idx` to map each leaf back 
to its top-level field position, only creating entries for primitive root 
columns. This matches iceberg-java's `ParquetSchemaUtil.addFallbackIds()`.
   
   Also renames "Leave column" to "Leaf column" in error messages.
   
   ## Are these changes tested?
   
   Three integration tests 
(`test_predicate_on_migrated_file_with_{struct,list,map}`) that write Parquet 
files without field IDs containing a nested type before an `id` column, then 
read with a predicate on `id`. All three reproduce the exact crash before the 
fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types [iceberg-rust]

Reply via email to