brunsgaard opened a new issue, #5875: URL: https://github.com/apache/paimon/issues/5875
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version 1.2.0 ### Compute Engine Flink 1.20.1 ### Minimal reproduce step I am inspecting the metadata and do not have the ability to reproduce the issue myself. I still hope this report is still useful to the community. ### What doesn't meet your expectations? When writing Iceberg-compatible manifest files, the generated Avro schema currently omits field-id annotations in nested fields such as `null_value_counts`. This omission leads to compatibility issues with query engines and tools that expect fully compliant Iceberg metadata, including such as BigQuery and PyIceberg For example, the current schema for null_value_counts appears as from the Paimon compat metafile: ```json { "name": "null_value_counts", "type": [ "null", { "type": "array", "items": { "type": "record", "name": "r2_null_value_counts", "fields": [ { "name": "key", "type": "int" }, { "name": "value", "type": "long" } ] }, "logicalType": "map" } ], "default": null } ``` However, the expected schema (as produced by PyIceberg) includes explicit field-id annotations: ```json { "name": "null_value_counts", "type": [ "null", { "type": "array", "items": { "type": "record", "name": "k121_v122", "fields": [ { "name": "key", "type": "int", "field-id": 121 }, { "name": "value", "type": "long", "field-id": 122 } ] }, "logicalType": "map" } ], "doc": "Map of column id to null value count", "default": null, "field-id": 110 } ``` It would be great if Paimon could update its manifest generation logic to include field-id metadata where required by the Iceberg spec. This small change would improve out-of-the-box interoperability with downstream engines. For more see section https://iceberg.apache.org/spec/#avro, > Iceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning. ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
