voonhous commented on code in PR #17581:
URL: https://github.com/apache/hudi/pull/17581#discussion_r2641772995
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaUtils.java:
##########
@@ -681,4 +683,44 @@ public static String getRecordQualifiedName(String
tableName) {
// Delegate to AvroSchemaUtils
return AvroSchemaUtils.getAvroRecordQualifiedName(tableName);
}
+
+ /**
+ * Resolves a union schema by finding the schema matching the given full
name.
+ * Handles both simple nullable unions (null + non-null) and complex unions
with multiple types.
+ *
+ * <p>This method supports the following union types:
+ * <ul>
+ * <li>Simple nullable unions: {@code ["null", "Type"]} - returns the
non-null type</li>
+ * <li>Complex unions: {@code ["null", "TypeA", "TypeB"]} - returns the
type matching fieldSchemaFullName</li>
+ * <li>Non-union schemas - returns the schema as-is</li>
+ * </ul>
+ *
+ * @param schema the schema to resolve (may or may not be a union)
+ * @param fieldSchemaFullName the full name of the schema to find within the
union
+ * @return the resolved schema
+ * @throws HoodieSchemaException if the union cannot be resolved or no
matching type is found
+ */
+ public static HoodieSchema resolveUnionSchema(HoodieSchema schema, String
fieldSchemaFullName) {
Review Comment:
It's still used in the test
`org.apache.hudi.common.schema.TestHoodieSchemaUtils#testResolveUnionSchemaConsistencyWithOriginalAvroImpl`
to ensure consistency between the `Avro.Schema` and `HoodieSchema`
implementation
I can remove this entirely within this PR if you're alright with it.
Will remove the method then, verified that the test passes locally before
removing, check pasted the screenshot here:
https://github.com/apache/hudi/pull/17581#discussion_r2641767329
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java:
##########
@@ -75,17 +76,16 @@ public class HiveAvroSerializer {
private final List<String> columnNames;
private final List<TypeInfo> columnTypes;
private final ArrayWritableObjectInspector objectInspector;
- private final Schema recordSchema;
+ private final HoodieSchema recordSchema;
private static final Logger LOG =
LoggerFactory.getLogger(HiveAvroSerializer.class);
- public HiveAvroSerializer(Schema schema) {
- schema = AvroSchemaUtils.getNonNullTypeFromUnion(schema);
Review Comment:
It's still used in the test
`org.apache.hudi.common.schema.TestHoodieSchemaUtils#testResolveUnionSchemaConsistencyWithOriginalAvroImpl`
to ensure consistency between the `Avro.Schema` and `HoodieSchema`
implementation
I can remove this entirely within this PR if you're alright with it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]