rdblue commented on a change in pull request #2096:
URL: https://github.com/apache/iceberg/pull/2096#discussion_r575716542
##########
File path: core/src/main/java/org/apache/iceberg/SchemaParser.java
##########
@@ -253,4 +266,15 @@ public static Schema fromJson(String json) {
}
});
}
+
+ public static Schema fromJsonWithId(int schemaId, JsonNode json) {
Review comment:
I'd prefer if all of the parsing methods returned the correct ID instead
of having some that use a version that ignores the ID.
I think you chose to use two methods because there are some situations where
the ID wasn't written and you don't want the schema to have the wrong ID. But,
because the `fromJson` method uses the `Schema` constructor that does not pass
the ID, the schemas will have the default ID, 0, anyway.
As long as the schemas have an ID, I think there is no need to have multiple
parser methods, both with and without ID. Also, we need to make sure that the
IDs are consistent when reading the schema from files that were written before
IDs. That should just be manifest files, where the schema is embedded in the
header.
For manifests, we need to add the schema ID when writing
(https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestWriter.java#L190).
And when reading, we will need to update how the spec is constructed. If the
spec is loaded by ID then it's fine. Otherwise we should load the schema by ID
to parse the spec, and if that doesn't work we should parse both the schema and
the spec
(https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestReader.java#L115-L120).
I think we should probably also try to take the parsed schema and find the
correct one from table metadata so that we don't have an incorrect ID anywhere.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]