jonvex commented on code in PR #11154:
URL: https://github.com/apache/hudi/pull/11154#discussion_r1622678545
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -117,10 +120,19 @@ public static Schema convert(Type type, String name) {
/** Convert an avro schema into internal type. */
public static Type convertToField(Schema schema) {
- return buildTypeFromAvroSchema(schema);
+ return buildTypeFromAvroSchema(schema, Collections.emptyMap());
}
+ private static Type convertToField(Schema schema, Map<String, Integer>
existingFieldNameToPositionMapping) {
+ return buildTypeFromAvroSchema(schema, existingFieldNameToPositionMapping);
+ }
+
+
Review Comment:
remove empty line
##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java:
##########
@@ -151,11 +163,11 @@ public static Schema nullableSchema(Schema schema) {
* @param schema a avro schema.
* @return a hudi type.
*/
- public static Type buildTypeFromAvroSchema(Schema schema) {
+ public static Type buildTypeFromAvroSchema(Schema schema, Map<String,
Integer> existingNameToPositions) {
// set flag to check this has not been visited.
- Deque<String> visited = new LinkedList();
- AtomicInteger nextId = new AtomicInteger(1);
- return visitAvroSchemaToBuildType(schema, visited, true, nextId);
+ Deque<String> visited = new LinkedList<>();
+ AtomicInteger nextId = new AtomicInteger(0);
Review Comment:
why do we go from 1->0? Is this because we remove
```
if (firstVisitRoot) {
nextAssignId = 0;
}
```
##########
hudi-spark-datasource/hudi-spark-common/src/test/java/org/apache/hudi/TestHoodieSchemaUtils.java:
##########
@@ -239,6 +240,51 @@ void testMissingColumn(boolean allowDroppedColumns) {
}
}
+ @Test
+ void testFieldReordering() {
+ // field order changes and incoming schema is missing an existing field
+ Schema start = createRecord("reorderFields",
+ createPrimitiveField("field1", Schema.Type.INT),
+ createPrimitiveField("field2", Schema.Type.INT),
+ createPrimitiveField("field3", Schema.Type.INT));
+ Schema end = createRecord("reorderFields",
+ createPrimitiveField("field3", Schema.Type.INT),
+ createPrimitiveField("field1", Schema.Type.INT));
+ assertEquals(start, deduceWriterSchema(end, start, true));
+
+ // nested field ordering changes and new field is added
+ start = createRecord("reorderNestedFields",
+ createPrimitiveField("field1", Schema.Type.INT),
+ createPrimitiveField("field2", Schema.Type.INT),
+ createArrayField("field3", createRecord("nestedRecord",
+ createPrimitiveField("nestedField1", Schema.Type.INT),
+ createPrimitiveField("nestedField2", Schema.Type.INT),
+ createPrimitiveField("nestedField3", Schema.Type.INT))),
+ createPrimitiveField("field4", Schema.Type.INT));
+ end = createRecord("reorderNestedFields",
+ createPrimitiveField("field1", Schema.Type.INT),
+ createPrimitiveField("field2", Schema.Type.INT),
+ createPrimitiveField("field5", Schema.Type.INT),
+ createArrayField("field3", createRecord("nestedRecord",
+ createPrimitiveField("nestedField2", Schema.Type.INT),
+ createPrimitiveField("nestedField1", Schema.Type.INT),
+ createPrimitiveField("nestedField3", Schema.Type.INT),
+ createPrimitiveField("nestedField4", Schema.Type.INT))),
+ createPrimitiveField("field4", Schema.Type.INT));
+
+ Schema expected = createRecord("reorderNestedFields",
+ createPrimitiveField("field1", Schema.Type.INT),
+ createPrimitiveField("field2", Schema.Type.INT),
+ createArrayField("field3", createRecord("reorderNestedFields.field3",
Review Comment:
ok, can you please change the nested record name to
`reorderNestedFields.field3` in start and end? That way we isolate what we are
testing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]