hejiay commented on code in PR #8095:
URL: https://github.com/apache/inlong/pull/8095#discussion_r1208799987
##########
inlong-sort/sort-flink/sort-flink-v1.13/sort-connectors/iceberg/src/main/java/org/apache/inlong/sort/iceberg/sink/multiple/SchemaChangeUtils.java:
##########
@@ -51,49 +53,81 @@ public class SchemaChangeUtils {
static List<TableChange> diffSchema(Schema oldSchema, Schema newSchema) {
List<String> oldFields =
oldSchema.columns().stream().map(NestedField::name).collect(Collectors.toList());
List<String> newFields =
newSchema.columns().stream().map(NestedField::name).collect(Collectors.toList());
- int oi = 0;
- int ni = 0;
+ Set<String> oldFieldSet = new HashSet<>(oldFields);
+ Set<String> newFieldSet = new HashSet<>(newFields);
+
+ Set<String> intersectColSet = Sets.intersection(oldFieldSet,
newFieldSet);
+ Set<String> colsToDelete = Sets.difference(oldFieldSet, newFieldSet);
+ Set<String> colsToAdd = Sets.difference(newFieldSet, oldFieldSet);
+
List<TableChange> tableChanges = new ArrayList<>();
- while (ni < newFields.size()) {
- if (oi < oldFields.size() &&
oldFields.get(oi).equals(newFields.get(ni))) {
- oi++;
- ni++;
- } else {
- NestedField newField = newSchema.findField(newFields.get(ni));
+
+ // step0: Unknown change
+ if (!colsToDelete.isEmpty() && !colsToAdd.isEmpty()) {
Review Comment:
of course not,for iceberg, each field has a unique id. Even if it is
deleted, the field still exists, but the content of the field will be hidden
when querying.if modify, only one modify step [a, b, c] -> [a, b, d],The
field[d] id has not changed.if add and delete, the filed d has a new unique id.
So first add and then delete cannot be confused with modify.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]