shardulm94 opened a new pull request #1132:
URL: https://github.com/apache/iceberg/pull/1132


   Avro files written by non-Iceberg writers can contain optional schemas where 
the NULL schema is second in the options list. If there is a default value 
associated with the field, we need to ensure that our visitors preserve this 
ordering else it can lead to issues like `org.apache.avro.AvroTypeException: 
Invalid default for field field: [] not a 
["null",{"type":"array","items":"long"}]`. This is because the Avro [spec 
requires](https://avro.apache.org/docs/current/spec.html#Unions) the type of 
the default value to match the first option in the schema.
   
   The changes should be limited to the codepaths which interact with 
non-Iceberg Avro files so I believe the visitors used in 
`ProjectionDatumReader` are the only ones affected.
   
   Error stacktraces:
   ```
   org.apache.avro.AvroTypeException: Invalid default for field field: [] not a 
["null",{"type":"array","items":"int","element-id":1}]
   
        at org.apache.avro.Schema.validateDefault(Schema.java:1540)
        at org.apache.avro.Schema.access$500(Schema.java:87)
        at org.apache.avro.Schema$Field.<init>(Schema.java:521)
        at org.apache.avro.Schema$Field.<init>(Schema.java:567)
        at org.apache.iceberg.avro.PruneColumns.copyField(PruneColumns.java:252)
        at org.apache.iceberg.avro.PruneColumns.record(PruneColumns.java:83)
        at org.apache.iceberg.avro.PruneColumns.record(PruneColumns.java:34)
        at 
org.apache.iceberg.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:50)
        at org.apache.iceberg.avro.PruneColumns.rootSchema(PruneColumns.java:46)
        at 
org.apache.iceberg.avro.AvroSchemaUtil.pruneColumns(AvroSchemaUtil.java:99)
        at 
org.apache.iceberg.avro.ProjectionDatumReader.setSchema(ProjectionDatumReader.java:59)
   
   
   org.apache.avro.AvroTypeException: Invalid default for field field: [] not a 
["null",{"type":"array","items":"long"}]
   
        at org.apache.avro.Schema.validateDefault(Schema.java:1540)
        at org.apache.avro.Schema.access$500(Schema.java:87)
        at org.apache.avro.Schema$Field.<init>(Schema.java:521)
        at org.apache.avro.Schema$Field.<init>(Schema.java:567)
        at 
org.apache.iceberg.avro.AvroSchemaUtil.copyField(AvroSchemaUtil.java:362)
        at 
org.apache.iceberg.avro.BuildAvroProjection.field(BuildAvroProjection.java:134)
        at 
org.apache.iceberg.avro.BuildAvroProjection.field(BuildAvroProjection.java:41)
        at 
org.apache.iceberg.avro.AvroCustomOrderSchemaVisitor$VisitFieldFuture.get(AvroCustomOrderSchemaVisitor.java:124)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterators$6.transform(Iterators.java:783)
        at 
org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:356)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:143)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:130)
        at 
org.apache.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:60)
        at 
org.apache.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:41)
        at 
org.apache.iceberg.avro.AvroCustomOrderSchemaVisitor.visit(AvroCustomOrderSchemaVisitor.java:51)
        at 
org.apache.iceberg.avro.AvroSchemaUtil.buildAvroProjection(AvroSchemaUtil.java:104)
        at 
org.apache.iceberg.avro.ProjectionDatumReader.setSchema(ProjectionDatumReader.java:60)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to