Re: [PR] Fix Delta to Iceberg not working on column mapping enabled Delta source table [incubator-xtable]

via GitHub Mon, 22 Dec 2025 00:13:33 -0800


xr-chen commented on code in PR #766:
URL: https://github.com/apache/incubator-xtable/pull/766#discussion_r2639025635



##########
xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java:
##########
@@ -161,12 +169,42 @@ private void initializeTableIfRequired(InternalTable 
internalTable) {
     }
   }
 
+  private MappedFields updateNameMapping(MappedFields mapping, Map<Integer, 
String> updates) {
+    if (mapping == null) {
+      return null;
+    }
+    List<MappedField> fieldResults = new ArrayList<>();
+    for (MappedField field : mapping.fields()) {
+      Set<String> fieldNames = new HashSet<>(field.names());
+      if (updates.containsKey(field.id())) {
+        fieldNames.add(updates.get(field.id()));
+      }
+      MappedFields nestedMapping = updateNameMapping(field.nestedMapping(), 
updates);
+      fieldResults.add(MappedField.of(field.id(), fieldNames, nestedMapping));
+    }
+    return MappedFields.of(fieldResults);
+  }
+
   @Override
   public void syncSchema(InternalSchema schema) {
     Schema latestSchema = schemaExtractor.toIceberg(schema);
+    String mappingJson = 
transaction.table().properties().get(TableProperties.DEFAULT_NAME_MAPPING);
+    boolean hasFieldIds =
+        schema.getAllFields().stream().anyMatch(field -> field.getFieldId() != 
null);
+    // Recreate name mapping when field IDs were provided in the source schema 
to ensure every
+    // field in the mapping was assigned the same ID as what is in the source 
schema
+    NameMapping mapping =
+        mappingJson == null || hasFieldIds
+            ? MappingUtil.create(latestSchema)
+            : NameMappingParser.fromJson(mappingJson);

Review Comment:
   Just tested, Iceberg will automatically update the name mapping as well when 
updating the schema with `UpdateSchema`, for example, when a new column `city` 
is added, the corresponding entry can also be found in the name mapping, schema
   ```
   {
         "id" : 19,
         "name" : "street",
         "required" : false,
         "type" : "string"
       }, {
         "id" : 29,
         "name" : "city",
         "required" : false,
         "type" : "string"
       }
   ```
   name mapping
   ```
   {
     "field-id" : 19,
     "names" : [ "street" ]
   }, {
     "field-id" : 29,
     "names" : [ "city" ]
   }
   ```
   It's unnecessary to recreate the name mapping based on the latest schema 
again, and recreating might result in the name mapping not aligning with the 
table schema. For example, the field Id of `city`  is supposed to be 20 in the 
`latestSchema`, if no field Ids are provided in the source table, the name 
mapping created from `latestSchema` will also map 20 to `city`, but in the 
table schema, 20 is already assigned to one of the nested fields which is of 
course not the field `city`.
   
   And when the schema is updated with the operations API, the name mapping 
will be recreated based on the latest schema to reflect the schema change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix Delta to Iceberg not working on column mapping enabled Delta source table [incubator-xtable]

Reply via email to