mbutrovich commented on PR #1821:
URL: https://github.com/apache/iceberg-rust/pull/1821#issuecomment-3517189896
I took a look at the Java code again, and it's handling this in a different
way. Let me see if I can get the logic to work in a way that's compatible with
Arrow-rs's Parquet reader by passing in a modified schema.
Location:
`iceberg/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java:80-89`
```java
MessageType typeWithIds;
if (ParquetSchemaUtil.hasIds(fileSchema)) {
typeWithIds = fileSchema;
this.projection = ParquetSchemaUtil.pruneColumns(fileSchema,
expectedSchema);
} else if (nameMapping != null) {
typeWithIds = ParquetSchemaUtil.applyNameMapping(fileSchema, nameMapping);
this.projection = ParquetSchemaUtil.pruneColumns(typeWithIds,
expectedSchema);
}
```
Java checks `hasIds()` on the Parquet file's schema before any processing.
If the file has no field IDs, `applyNameMapping()` rewrites the schema by
assigning field IDs based on column names before any data is read. Let me see
if I can get similar behavior in the `ArrowReader` instead of the
`RecordBatchTransformer`.
Thanks for continuing to give feedback on this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]