mbutrovich commented on PR #1821:
URL: https://github.com/apache/iceberg-rust/pull/1821#issuecomment-3497818272

   Claude gave me a question to ask based on what I'm seeing since I'm 
struggling to articulate it.
   
   **Question**: Should partition constants take priority over Parquet field ID 
matches?
   
   **Scenario**: The `TestAddFilesProcedure.addPartitionToPartitioned` test 
writes partitioned Parquet file that has renumbered field IDs:
   - Iceberg schema: field_id=1 → "id" (partition column)
   - Parquet file: field_id=1 → "name" (renumbered, "id" excluded)
   
   Java behavior (BaseParquetReaders.java:299-314):
   ```java
   if (idToConstant.containsKey(id)) {
       // Use partition constant
   } else if (reader != null) {
       // Use Parquet column by field ID
   }
   ```
   Java checks partition constants first, giving them priority over field ID 
matches.
   
   Spec says (https://iceberg.apache.org/spec/#column-projection):
   "Columns in Iceberg data files are selected by field id....Values for field 
ids which are not present in a data file must be resolved according to..." 
[fallback rules]
   
   When field_id=1 exists in Parquet but points to the wrong column, should it 
be considered "not present"? Or is Java's "constants-first" approach an
   implementation detail to handle Spark's field ID renumbering?
   
   Should iceberg-rust match Java's behavior (check constants first), or 
strictly follow "select by field id" (check Parquet first)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to