yihua commented on code in PR #14314:
URL: https://github.com/apache/hudi/pull/14314#discussion_r2553293283
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -64,65 +63,65 @@ public class AvroInternalSchemaConverter {
//
// This is crucial aspect of maintaining compatibility b/w schemas,
after
// converting Avro [[Schema]]s to [[InternalSchema]]s and back
- private static final String AVRO_NAME_DELIMITER = ".";
+ private static final String FIELD_NAME_DELIMITER = ".";
/**
- * Convert internalSchema to avro Schema.
+ * Convert internalSchema to HoodieSchema.
*
* @param internalSchema internal schema.
* @param name the record name.
- * @return an avro Schema.
+ * @return an HoodieSchema.
*/
- public static Schema convert(InternalSchema internalSchema, String name) {
- return buildAvroSchemaFromInternalSchema(internalSchema, name);
+ public static HoodieSchema convert(InternalSchema internalSchema, String
name) {
+ return buildHoodieSchemaFromInternalSchema(internalSchema, name);
}
- public static InternalSchema pruneAvroSchemaToInternalSchema(Schema schema,
InternalSchema originSchema) {
+ public static InternalSchema pruneHoodieSchemaToInternalSchema(HoodieSchema
schema, InternalSchema originSchema) {
List<String> pruneNames = collectColNamesFromSchema(schema);
return InternalSchemaUtils.pruneInternalSchema(originSchema, pruneNames);
}
/**
* Collect all the leaf nodes names.
*
- * @param schema a avro schema.
+ * @param schema a HoodieSchema.
* @return leaf nodes full names.
*/
@VisibleForTesting
- static List<String> collectColNamesFromSchema(Schema schema) {
+ static List<String> collectColNamesFromSchema(HoodieSchema schema) {
List<String> result = new ArrayList<>();
Deque<String> visited = new LinkedList<>();
- collectColNamesFromAvroSchema(schema, visited, result);
+ collectColNamesFromHoodieSchema(schema, visited, result);
Review Comment:
nit: for brevity:
```suggestion
collectColNamesFromSchema(schema, visited, result);
```
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -147,99 +146,79 @@ private static void addFullNameIfLeafNode(Schema.Type
type, String name, Deque<S
}
/**
- * Converting from avro -> internal schema -> avro
+ * Converting from HoodieSchema -> internal schema -> HoodieSchema
* causes null to always be first in unions.
* if we compare a schema that has not been converted to internal schema
* at any stage, the difference in ordering can cause issues. To resolve
this,
- * we order null to be first for any avro schema that enters into hudi.
+ * we order null to be first for any HoodieSchema that enters into hudi.
* AvroSchemaUtils.isProjectionOfInternal uses index based comparison for
unions.
* Spark and flink don't support complex unions so this would not be an issue
* but for the metadata table HoodieMetadata.avsc uses a trick where we have
a bunch of
* different types wrapped in record for col stats.
*
- * @param schema avro schema.
- * @return an avro Schema where null is the first.
+ * @param schema HoodieSchema.
+ * @return a HoodieSchema where null is the first.
*/
- public static Schema fixNullOrdering(Schema schema) {
+ public static HoodieSchema fixNullOrdering(HoodieSchema schema) {
if (schema == null) {
- return Schema.create(Schema.Type.NULL);
- } else if (schema.getType() == Schema.Type.NULL) {
+ return HoodieSchema.create(HoodieSchemaType.NULL);
+ } else if (schema.getType() == HoodieSchemaType.NULL) {
return schema;
}
return convert(convert(schema), schema.getFullName());
}
/**
- * Convert RecordType to avro Schema.
+ * Convert RecordType to HoodieSchema.
*
* @param type internal schema.
* @param name the record name.
- * @return an avro Schema.
+ * @return a HoodieSchema.
*/
- public static Schema convert(Types.RecordType type, String name) {
- return buildAvroSchemaFromType(type, name);
+ public static HoodieSchema convert(Types.RecordType type, String name) {
+ return buildHoodieSchemaFromType(type, name);
Review Comment:
```suggestion
return buildSchemaFromType(type, name);
```
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -18,7 +18,10 @@
package org.apache.hudi.internal.schema.convert;
-import org.apache.hudi.avro.AvroSchemaUtils;
+import org.apache.hudi.common.schema.HoodieJsonProperties;
+import org.apache.hudi.common.schema.HoodieSchema;
+import org.apache.hudi.common.schema.HoodieSchemaField;
+import org.apache.hudi.common.schema.HoodieSchemaType;
Review Comment:
I start to think that it's better to remove the `Hoodie` prefix from these
class names in `org.apache.hudi.common.schema` packages, so it's easier to
read, as eventually we are going to only maintain one schema system. Wdyt?
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -384,12 +374,26 @@ public HoodieSchemaType getType() {
}
/**
- * Returns the name of this schema, if it has one.
+ * If this is a union schema representing a null, it returns the underlying
non-null type.
+ * @return the underlying non-null HoodieSchema
+ */
+ public HoodieSchema getUnderlyingType() {
+ if (HoodieSchemaType.UNION == type) {
+ return getTypes().stream()
+ .filter(schema -> schema.getType() != HoodieSchemaType.NULL)
+ .findFirst()
+ .orElseThrow(() -> new IllegalArgumentException("No non-null type
found in Union"));
Review Comment:
Use `HoodieSchema#getNonNullType` instead? Or is this only intended to get
the type, so missing an element of a union is fine?
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/InternalSchemaConverter.java:
##########
@@ -266,74 +245,75 @@ private static void checkNullType(Type fieldType, String
fieldName, Deque<String
}
/**
- * Converts an avro schema into hudi type.
+ * Converts an HoodieSchema into internal type.
*
- * @param schema a avro schema.
- * @param visited track the visit node when do traversal for avro schema;
used to check if the name of avro record schema is correct.
+ * @param schema a HoodieSchema.
+ * @param visited track the visit node when do traversal for HoodieSchema;
used to check if the name of record schema is correct.
* @param currentFieldPath the dot-separated path to the current field;
empty at the root and always ends in a '.' otherwise for ease of concatenation.
* @param nextId an initial id which used to create id for all fields.
- * @return a hudi type match avro schema.
+ * @return a hudi type match HoodieSchema.
*/
- private static Type visitAvroSchemaToBuildType(Schema schema, Deque<String>
visited, String currentFieldPath, AtomicInteger nextId, Map<String, Integer>
existingNameToPosition) {
+ private static Type visitHoodieSchemaToBuildType(HoodieSchema schema,
Deque<String> visited, String currentFieldPath, AtomicInteger nextId,
Map<String, Integer> existingNameToPosition) {
Review Comment:
```suggestion
private static Type visitSchemaToBuildType(HoodieSchema schema,
Deque<String> visited, String currentFieldPath, AtomicInteger nextId,
Map<String, Integer> existingNameToPosition) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]