rdblue commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r615310750
##########
File path: core/src/main/java/org/apache/iceberg/SchemaParser.java
##########
@@ -158,7 +169,7 @@ public static String toJson(Schema schema, boolean pretty) {
if (pretty) {
generator.useDefaultPrettyPrinter();
}
- toJson(schema.asStruct(), generator);
+ toJson(schema.asStruct(), schema.schemaId(), schema.rowIdentifiers(),
generator);
Review comment:
I think we do want to pass the schema ID here. Whenever we can, we
should preserve the ID. My comment on the other PR was recommending that we
preserve the ID where possible even when reading and writing manifest files.
I think I missed @yyanyy's follow-up comment about other places that embed
the schema ID. I tend to agree with that line of reasoning, but because of it
I'm coming to a different conclusion. Rather than removing the schema ID from
those places, I think we should try to maintain them.
The risk to keeping schema IDs in other files is that another file may be
written with a schema ID that is lost because an older writer (e.g. v1) didn't
write out the schema list, only the current schema. If that happens, then an
Avro file might write a schema with ID 1, which is then lost and later
redefined. But as Yan noted, the actual schema (fields) are stored in those
files and we don't need to look up the schema from the ID. So we have a choice
of whether to discard the schema ID or not. I'd prefer not to drop it, since we
can easily detect whether the schema actually matches the one with its ID and
because we won't really need to do the lookup. I think it adds complexity to
remember whether or not to drop the ID.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]