yegangy0718 opened a new issue #4209:
URL: https://github.com/apache/iceberg/issues/4209


   We have a schema called RequestContextEvent. Inside the schema, it defines a 
record named RecognitionMetrics.  RecognitionMetrics is referred by another 
record called FailedRecord in the schema.
   {
     "type": "record",
     "name": "RequestContextEvent",
     "namespace": "avro.com.schemas",
     "fields": [
       {
         "name": "payload",
         "type": [
           "null",
           {
             "type": "record",
             "name": "RequestContext",
             "fields": [
               {
                 "name": "ended",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "EndedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           {
                             "type": "record",
                             "name": "RecognitionMetrics",
                             "fields": [
                               {
                                 "name": "optionalStringField",
                                 "type": [
                                   "null",
                                   "string"
                                 ],
                                 "default": null
                               },
                               {
                                 "name": "optionalLongField",
                                 "type": [
                                   "null",
                                   "long"
                                 ],
                                 "default": null
                               }
                             ]
                           }
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               },
               {
                 "name": "failed",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "FailedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           "RecognitionMetrics"
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               }
             ]
           }
         ],
         "default": null
       }
     ]
   }
   
   We use this schema to create the iceberg table. And later, we want to delete 
the optionalLongField in RecognitionMetrics. So the new schema becomes
   
   {
     "type": "record",
     "name": "RequestContextEvent",
     "namespace": "avro.com.schemas",
     "fields": [
       {
         "name": "payload",
         "type": [
           "null",
           {
             "type": "record",
             "name": "RequestContext",
             "fields": [
               {
                 "name": "ended",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "EndedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           {
                             "type": "record",
                             "name": "RecognitionMetrics",
                             "fields": [
                               {
                                 "name": "optionalStringField",
                                 "type": [
                                   "null",
                                   "string"
                                 ],
                                 "default": null
                               }
                             ]
                           }
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               },
               {
                 "name": "failed",
                 "type": [
                   "null",
                   {
                     "type": "record",
                     "name": "FailedRecord",
                     "fields": [
                       {
                         "name": "metrics",
                         "type": [
                           "null",
                           "RecognitionMetrics"
                         ],
                         "default": null
                       }
                     ]
                   }
                 ],
                 "default": null
               }
             ]
           }
         ],
         "default": null
       }
     ]
   }
   
   Our iceberg evolution code never drops column/field since it contains 
historical data. And to make sure the data ingestion write data in right order, 
we always apply the function AvroSchemaUtil.buildAvroProjection to the new 
schema based on the iceberg table schema. 
   
   But for this case, the projectionSchema will throw error when converting to 
string format: Can't redefine: avro.com.schemas.RecognitionMetrics
   since when building the projection schema, iceberg table still has the 
optionalLongField, the function will add the deleted field(optionalLongField) 
back. It creates RecognitionMetrics twice with different field name inside, one 
is optionalLongField_r2, the other is optionalLongField_r5. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to