wmoustafa commented on code in PR #5704:
URL: https://github.com/apache/iceberg/pull/5704#discussion_r979465023
##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -82,11 +83,45 @@ private static <T> T visitRecord(
private static <T> T visitUnion(Type type, Schema union,
AvroSchemaWithTypeVisitor<T> visitor) {
List<Schema> types = union.getTypes();
List<T> options = Lists.newArrayListWithExpectedSize(types.size());
- for (Schema branch : types) {
- if (branch.getType() == Schema.Type.NULL) {
- options.add(visit((Type) null, branch, visitor));
- } else {
- options.add(visit(type, branch, visitor));
+
+ // simple union case
+ if (AvroSchemaUtil.isOptionSchema(union)) {
+ for (Schema branch : types) {
+ if (branch.getType() == Schema.Type.NULL) {
+ options.add(visit((Type) null, branch, visitor));
+ } else {
+ options.add(visit(type, branch, visitor));
+ }
+ }
+ } else { // complex union case
+ Preconditions.checkArgument(
+ type instanceof Types.StructType,
+ "Cannot visit invalid Iceberg type: %s for Avro complex union type:
%s",
+ type,
+ union);
+ Map<String, Integer> fieldNameToId =
+ (Map)
union.getObjectProp(SchemaToType.AVRO_FIELD_NAME_TO_ICEBERG_ID);
+ for (Schema branch : types) {
+ if (branch.getType() == Schema.Type.NULL) {
+ options.add(visit((Type) null, branch, visitor));
+ } else {
+ String name =
+ branch.getType().equals(Schema.Type.RECORD)
+ ? branch.getName()
+ : branch.getType().getName();
+ if (fieldNameToId.containsKey(name)) {
+ int fieldId = fieldNameToId.get(name);
+ Types.NestedField branchType = type.asStructType().field(fieldId);
+ if (branchType != null) {
+ options.add(visit(branchType.type(), branch, visitor));
+ } else {
+ Type pseudoBranchType = AvroSchemaUtil.convert(branch);
+ options.add(visit(pseudoBranchType, branch, visitor));
+ }
+ } else {
+ options.add(visit((Type) null, branch, visitor));
+ }
+ }
}
}
return visitor.union(type, union, options);
Review Comment:
I think here you are returning all the readers and in the `UnionReader` you
are trying to figure out which ones to use and which to ignore. Can we just
pass the required readers here in a way that aligns with the expected schema
ahead of time?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]