rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r615310750



##########
File path: core/src/main/java/org/apache/iceberg/SchemaParser.java
##########
@@ -158,7 +169,7 @@ public static String toJson(Schema schema, boolean pretty) {
       if (pretty) {
         generator.useDefaultPrettyPrinter();
       }
-      toJson(schema.asStruct(), generator);
+      toJson(schema.asStruct(), schema.schemaId(), schema.rowIdentifiers(), 
generator);

Review comment:
       I think we do want to pass the schema ID here. Whenever we can, we 
should preserve the ID. My comment on the other PR was recommending that we 
preserve the ID where possible even when reading and writing manifest files.
   
   I think I missed @yyanyy's follow-up comment about other places that embed 
the schema ID. I tend to agree with that line of reasoning, but because of it 
I'm coming to a different conclusion. Rather than removing the schema ID from 
those places, I think we should try to maintain them.
   
   The risk to keeping schema IDs in other files is that another file may be 
written with a schema ID that is lost because an older writer (e.g. v1) didn't 
write out the schema list, only the current schema. If that happens, then an 
Avro file might write a schema with ID 1, which is then lost and later 
redefined. But as Yan noted, the actual schema (fields) are stored in those 
files and we don't need to look up the schema from the ID. So we have a choice 
of whether to discard the schema ID or not. I'd prefer not to drop it, since we 
can easily detect whether the schema actually matches the one with its ID and 
because we won't really need to do the lookup. I think it adds complexity to 
remember whether or not to drop the ID.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to