shardulm94 opened a new pull request #1132: URL: https://github.com/apache/iceberg/pull/1132
Avro files written by non-Iceberg writers can contain optional schemas where the NULL schema is second in the options list. If there is a default value associated with the field, we need to ensure that our visitors preserve this ordering else it can lead to issues like `org.apache.avro.AvroTypeException: Invalid default for field field: [] not a ["null",{"type":"array","items":"long"}]`. This is because the Avro [spec requires](https://avro.apache.org/docs/current/spec.html#Unions) the type of the default value to match the first option in the schema. The changes should be limited to the codepaths which interact with non-Iceberg Avro files so I believe the visitors used in `ProjectionDatumReader` are the only ones affected. Error stacktraces: ``` org.apache.avro.AvroTypeException: Invalid default for field field: [] not a ["null",{"type":"array","items":"int","element-id":1}] at org.apache.avro.Schema.validateDefault(Schema.java:1540) at org.apache.avro.Schema.access$500(Schema.java:87) at org.apache.avro.Schema$Field.<init>(Schema.java:521) at org.apache.avro.Schema$Field.<init>(Schema.java:567) at org.apache.iceberg.avro.PruneColumns.copyField(PruneColumns.java:252) at org.apache.iceberg.avro.PruneColumns.record(PruneColumns.java:83) at org.apache.iceberg.avro.PruneColumns.record(PruneColumns.java:34) at org.apache.iceberg.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:50) at org.apache.iceberg.avro.PruneColumns.rootSchema(PruneColumns.java:46) at org.apache.iceberg.avro.AvroSchemaUtil.pruneColumns(AvroSchemaUtil.java:99) at org.apache.iceberg.avro.ProjectionDatumReader.setSchema(ProjectionDatumReader.java:59) org.apache.avro.AvroTypeException: Invalid default for field field: [] not a ["null",{"type":"array","items":"long"}] at org.apache.avro.Schema.validateDefault(Schema.java:1540) at org.apache.avro.Schema.access$500(Schema.java:87) at org.apache.avro.Schema$Field.<init>(Schema.java:521) at org.apache.avro.Schema$Field.<init>(Schema.java:567) at org.apache.iceberg.avro.AvroSchemaUtil.copyField(AvroSchemaUtil.java:362) at org.apache.iceberg.avro.BuildAvroProjection.field(BuildAvroProjection.java:134) at org.apache.iceberg.avro.BuildAvroProjection.field(BuildAvroProjection.java:41) at org.apache.iceberg.avro.AvroCustomOrderSchemaVisitor$VisitFieldFuture.get(AvroCustomOrderSchemaVisitor.java:124) at org.apache.iceberg.relocated.com.google.common.collect.Iterators$6.transform(Iterators.java:783) at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47) at org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:356) at org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:143) at org.apache.iceberg.relocated.com.google.common.collect.Lists.newArrayList(Lists.java:130) at org.apache.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:60) at org.apache.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:41) at org.apache.iceberg.avro.AvroCustomOrderSchemaVisitor.visit(AvroCustomOrderSchemaVisitor.java:51) at org.apache.iceberg.avro.AvroSchemaUtil.buildAvroProjection(AvroSchemaUtil.java:104) at org.apache.iceberg.avro.ProjectionDatumReader.setSchema(ProjectionDatumReader.java:60) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org