mbutrovich commented on PR #1821:
URL: https://github.com/apache/iceberg-rust/pull/1821#issuecomment-3497818272
Claude gave me a question to ask based on what I'm seeing since I'm
struggling to articulate it.
**Question**: Should partition constants take priority over Parquet field ID
matches?
**Scenario**: The `TestAddFilesProcedure.addPartitionToPartitioned` test
writes partitioned Parquet file that has renumbered field IDs:
- Iceberg schema: field_id=1 → "id" (partition column)
- Parquet file: field_id=1 → "name" (renumbered, "id" excluded)
Java behavior (BaseParquetReaders.java:299-314):
```java
if (idToConstant.containsKey(id)) {
// Use partition constant
} else if (reader != null) {
// Use Parquet column by field ID
}
```
Java checks partition constants first, giving them priority over field ID
matches.
Spec says (https://iceberg.apache.org/spec/#column-projection):
"Columns in Iceberg data files are selected by field id....Values for field
ids which are not present in a data file must be resolved according to..."
[fallback rules]
When field_id=1 exists in Parquet but points to the wrong column, should it
be considered "not present"? Or is Java's "constants-first" approach an
implementation detail to handle Spark's field ID renumbering?
Should iceberg-rust match Java's behavior (check constants first), or
strictly follow "select by field id" (check Parquet first)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]