Re: [PR] feat(reader): Add PartitionSpec support to FileScanTask and RecordBatchTransformer [iceberg-rust]

via GitHub Tue, 11 Nov 2025 06:33:04 -0800


mbutrovich commented on PR #1821:
URL: https://github.com/apache/iceberg-rust/pull/1821#issuecomment-3517189896


   I took a look at the Java code again, and it's handling this in a different 
way. Let me see if I can get the logic to work in a way that's compatible with 
Arrow-rs's Parquet reader by passing in a modified schema.
   
   Location: 
`iceberg/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java:80-89`
   
   ```java
   MessageType typeWithIds;
   if (ParquetSchemaUtil.hasIds(fileSchema)) {
     typeWithIds = fileSchema;
     this.projection = ParquetSchemaUtil.pruneColumns(fileSchema, 
expectedSchema);
   } else if (nameMapping != null) {
     typeWithIds = ParquetSchemaUtil.applyNameMapping(fileSchema, nameMapping);
     this.projection = ParquetSchemaUtil.pruneColumns(typeWithIds, 
expectedSchema);
   }
   ```
   Java checks `hasIds()` on the Parquet file's schema before any processing. 
If the file has no field IDs, `applyNameMapping()` rewrites the schema by 
assigning field IDs based on column names before any data is read. Let me see 
if I can get similar behavior in the `ArrowReader` instead of the 
`RecordBatchTransformer`.
   
   Thanks for continuing to give feedback on this!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(reader): Add PartitionSpec support to FileScanTask and RecordBatchTransformer [iceberg-rust]

Reply via email to