rdblue commented on code in PR #4242:
URL: https://github.com/apache/iceberg/pull/4242#discussion_r877520337


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -79,11 +79,29 @@ private static <T> T visitRecord(Types.StructType struct, 
Schema record, AvroSch
   private static <T> T visitUnion(Type type, Schema union, 
AvroSchemaWithTypeVisitor<T> visitor) {
     List<Schema> types = union.getTypes();
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+
+    // simple union case
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else { // complex union case
+      Preconditions.checkArgument(type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: 
%s", type, union);

Review Comment:
   > I am a bit worried about using only the top level types since it will fail 
in unexpected places and could lead to cryptic error messages if things go wrong
   
   We can fail gracefully. For example, if there are two records, we throw an 
exception in name mapping that multiple records aren't supported. If you think 
this is a common case, we can go further to find a solution for arbitrary 
nested types. That isn't too hard, actually. We could do it based on the child 
field set, which would have to match or at least have some overlap.
   
   I don't think that doing this by order is a good idea. That could easily 
lead to worse cases where we're returning the wrong data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to