rdblue commented on code in PR #4242:
URL: https://github.com/apache/iceberg/pull/4242#discussion_r870731663


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -79,11 +79,29 @@ private static <T> T visitRecord(Types.StructType struct, 
Schema record, AvroSch
   private static <T> T visitUnion(Type type, Schema union, 
AvroSchemaWithTypeVisitor<T> visitor) {
     List<Schema> types = union.getTypes();
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+
+    // simple union case
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else { // complex union case
+      Preconditions.checkArgument(type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: 
%s", type, union);

Review Comment:
   I think this should also check whether the schema with type visitor has the 
`tag` field. There's no guarantee that it does.
   
   Along the same lines, what happens if the struct is projected or out of 
order? I'd prefer to look up the struct field for each option in the union by 
field ID, just like we do with struct fields. For a struct field, we get the 
field ID from the Avro schema and use that to find the corresponding field in 
the Iceberg struct.
   
   If you end up using field IDs, I think the challenge is getting those field 
IDs in the Avro schema. I'm assuming that you're using `NameMapping` to work 
with the incoming Avro schemas, right? Can `NameMapping` be updated to map 
union fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to