szehon-ho commented on code in PR #4546:
URL: https://github.com/apache/iceberg/pull/4546#discussion_r858049267
##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -562,6 +562,60 @@ public void testSetSnapshotSummary() throws Exception {
Assert.assertEquals("The snapshot summary must not be in parameters due to
the size limit", 0, parameters.size());
}
+ @Test
+ public void testSetDefaultPartitionSpec() throws Exception {
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get(), "unique ID"),
+ required(2, "data", Types.StringType.get())
+ );
+ TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+ try {
+ Table table = catalog.buildTable(tableIdent, schema).create();
+ Assert.assertFalse("Must not have default partition spec",
+
hmsTableParameters().containsKey(TableProperties.DEFAULT_PARTITION_SPEC));
+
+ table.updateSpec().addField(bucket("data", 16)).commit();
+ Assert.assertEquals(PartitionSpecParser.toJson(table.spec()),
+ hmsTableParameters().get(TableProperties.DEFAULT_PARTITION_SPEC));
+ } finally {
+ catalog.dropTable(tableIdent);
+ }
+ }
+
+ @Test
+ public void testSetCurrentSchema() throws Exception {
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get(), "unique ID"),
+ required(2, "data", Types.StringType.get())
+ );
+ TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+ try {
+ Table table = catalog.buildTable(tableIdent, schema).create();
+
+ Assert.assertEquals(SchemaParser.toJson(table.schema()),
+ hmsTableParameters().get(TableProperties.CURRENT_SCHEMA));
+
+ // add many new fields to make the schema json string exceed the limit
+ UpdateSchema updateSchema = table.updateSchema();
+ for (int i = 0; i < 600; i++) {
+ updateSchema.addColumn("new_col_" + i, Types.StringType.get());
+ }
+ updateSchema.commit();
+
+ Assert.assertTrue(SchemaParser.toJson(table.schema()).length() > 32672);
Review Comment:
Line 597 and line 607 both call: SchemaParser.toJson(table.schema()), isn't
it?
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -433,6 +439,38 @@ void setSnapshotSummary(Map<String, String> parameters,
Snapshot currentSnapshot
}
}
+ private void setSchema(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.CURRENT_SCHEMA);
+ if (metadata.schema() != null) {
+ String schema = SchemaParser.toJson(metadata.schema());
+ setField(parameters, TableProperties.CURRENT_SCHEMA, schema);
+ }
+ }
+
+ private void setPartitionSpec(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.DEFAULT_PARTITION_SPEC);
+ if (metadata.spec() != null && metadata.spec().isPartitioned()) {
+ String spec = PartitionSpecParser.toJson(metadata.spec());
+ setField(parameters, TableProperties.DEFAULT_PARTITION_SPEC, spec);
+ }
+ }
+
+ private void setSortOrder(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.DEFAULT_SORT_ORDER);
+ if (metadata.sortOrder() != null && metadata.sortOrder().isSorted()) {
+ String sortOrder = SortOrderParser.toJson(metadata.sortOrder());
+ setField(parameters, TableProperties.DEFAULT_SORT_ORDER, sortOrder);
+ }
+ }
+
+ private void setField(Map<String, String> parameters, String key, String
value) {
Review Comment:
Make sense, but could we at least take care of the ones in this PR? (
schema, partition spec, sort order)? We can have a follow up for the other
ones not touched by this PR.
Just didn't want to leave it in a state where we are wasting CPU cycle (JSON
serialization) needlessly if the user turns off this feature. As this is done
in the critical commit block, unlike the original serialization which happens
before. The other HMS table properties to me are also less CPU intensive as
they are just getting a field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]