[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4898: Core: Add schema_id to ContentFile/ManifestFile

GitBox Fri, 10 Jun 2022 07:15:24 -0700


ConeyLiu commented on code in PR #4898:
URL: https://github.com/apache/iceberg/pull/4898#discussion_r894580150



##########
core/src/main/java/org/apache/iceberg/avro/Avro.java:
##########
@@ -338,13 +340,19 @@ public DataWriteBuilder withSortOrder(SortOrder 
newSortOrder) {
       return this;
     }
 
+    public DataWriteBuilder withSchemaId(int newSchemaId) {
+      this.schemaId = newSchemaId;

Review Comment:
   You could find it in the following code. The `new Schema(struct.fields())` 
will create the schema with the default schema ID 0.
   
   ```java
     Schema writeSchema = validateOrMergeWriteSchema(table, dsSchema, 
writeConf);
   
     // validateOrMergeWriteSchema
     private static Schema validateOrMergeWriteSchema(Table table, StructType 
dsSchema, SparkWriteConf writeConf) {
       Schema writeSchema;
       if (writeConf.mergeSchema()) {
         ...
       } else {
         writeSchema = SparkSchemaUtil.convert(table.schema(), dsSchema);
         TypeUtil.validateWriteSchema(
             table.schema(), writeSchema, writeConf.checkNullability(), 
writeConf.checkOrdering());
       }
   
       return writeSchema;
     }
   
     // SparkSchemaUtil.convert
     public static Schema convert(Schema baseSchema, StructType sparkType) {
       // convert to a type with fresh ids
       Types.StructType struct = SparkTypeVisitor.visit(sparkType, new 
SparkTypeToType(sparkType)).asStructType();
       // reassign ids to match the base schema
       Schema schema = TypeUtil.reassignIds(new Schema(struct.fields()), 
baseSchema);
       // fix types that can't be represented in Spark (UUID and Fixed)
       return SparkFixupTypes.fixup(schema, baseSchema);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4898: Core: Add schema_id to ContentFile/ManifestFile

Reply via email to