muttcg opened a new pull request, #5981:
URL: https://github.com/apache/paimon/pull/5981

   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Linked issue: close #5875 
   
   Apache Iceberg specification for Avro requires to have field-ids in order 
[to support ID-based column pruning.](https://iceberg.apache.org/spec/#avro)
   
   For example Google Big Query engine as well as PyIceberg don't work at all 
without field-ids causing critical issues:
   
   ```
   Error while reading data, error message: The Apache Avro library failed to 
read data with the following error: Cannot resolve: { "type": "record", "name": 
"r2_null_value_counts", "fields": [ { "name": "key", "type": "int" }, { "name": 
"value", "type": "long" } ] } with { "type": "record", "name": "k121_v122", 
"fields": [ { "name": "key", "type": "int", "field-id": 121 }, { "name": 
"value", "type": "long", "field-id": 122 } ] }; Failed to dispatch pruner query 
for unity-data-ads-test.example.flair_paimon_jacob.meta original query id: *** 
with new query_id: *** File: 
bigstore/*****-test/table_v2/iceberg/example/flair_paimon/metadata/7d8b21d4-e536-49cd-bc1c-ee68b154c178-m8.avro
   ```
   
   The change affects only Iceberg Avro schema creation, bringing new Iceberg 
custom properties following the specification, as well as fixing the wrong ID 
for `partitions` values.
   
   ### Tests
   All existing test must work. Added 
org.apache.paimon.iceberg.IcebergCompatibilityTest.testIcebergAvroFieldIds test 
to cover all required ids.
   
   ### API and Format
   
   No
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to