rdblue commented on code in PR #4242:
URL: https://github.com/apache/iceberg/pull/4242#discussion_r870731663
##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -79,11 +79,29 @@ private static <T> T visitRecord(Types.StructType struct,
Schema record, AvroSch
private static <T> T visitUnion(Type type, Schema union,
AvroSchemaWithTypeVisitor<T> visitor) {
List<Schema> types = union.getTypes();
List<T> options = Lists.newArrayListWithExpectedSize(types.size());
- for (Schema branch : types) {
- if (branch.getType() == Schema.Type.NULL) {
- options.add(visit((Type) null, branch, visitor));
- } else {
- options.add(visit(type, branch, visitor));
+
+ // simple union case
+ if (AvroSchemaUtil.isOptionSchema(union)) {
+ for (Schema branch : types) {
+ if (branch.getType() == Schema.Type.NULL) {
+ options.add(visit((Type) null, branch, visitor));
+ } else {
+ options.add(visit(type, branch, visitor));
+ }
+ }
+ } else { // complex union case
+ Preconditions.checkArgument(type instanceof Types.StructType,
+ "Cannot visit invalid Iceberg type: %s for Avro complex union type:
%s", type, union);
Review Comment:
I think this should also check whether the schema with type visitor has the
`tag` field. There's no guarantee that it does.
Along the same lines, what happens if the struct is projected or out of
order? I'd prefer to look up the struct field for each option in the union by
field ID, just like we do with struct fields. For a struct field, we get the
field ID from the Avro schema and use that to find the corresponding field in
the Iceberg struct.
If you end up using field IDs, I think the challenge is getting those field
IDs in the Avro schema. I'm assuming that you're using `NameMapping` to work
with the incoming Avro schemas, right? Can `NameMapping` be updated to map
union fields?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]