mbutrovich opened a new pull request, #2307:
URL: https://github.com/apache/iceberg-rust/pull/2307
## Which issue does this PR close?
- Closes #2306.
## What changes are included in this PR?
`build_fallback_field_id_map` iterated over Parquet leaf columns instead of
top-level fields when building the field ID to column index mapping for
migrated files (no embedded field IDs). When nested types (struct, list, map)
precede a primitive column, they expand into multiple leaves, causing the
mapping to diverge from `add_fallback_field_ids_to_arrow_schema` which
correctly assigns ordinal IDs to top-level Arrow fields. This made predicates
on columns after nested types resolve to a leaf inside the group, crashing with
"Leaf column `id` in predicates isn't a root column in Parquet schema".
The fix uses `SchemaDescriptor::get_column_root_idx` to map each leaf back
to its top-level field position, only creating entries for primitive root
columns. This matches iceberg-java's `ParquetSchemaUtil.addFallbackIds()`.
Also renames "Leave column" to "Leaf column" in error messages.
## Are these changes tested?
Three integration tests
(`test_predicate_on_migrated_file_with_{struct,list,map}`) that write Parquet
files without field IDs containing a nested type before an `id` column, then
read with a predicate on `id`. All three reproduce the exact crash before the
fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]