[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4546: Hive: Expose default partition spec and sort order in HMS.

GitBox Mon, 25 Apr 2022 14:46:21 -0700


szehon-ho commented on code in PR #4546:
URL: https://github.com/apache/iceberg/pull/4546#discussion_r858049267



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -562,6 +562,60 @@ public void testSetSnapshotSummary() throws Exception {
     Assert.assertEquals("The snapshot summary must not be in parameters due to 
the size limit", 0, parameters.size());
   }
 
+  @Test
+  public void testSetDefaultPartitionSpec() throws Exception {
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get(), "unique ID"),
+        required(2, "data", Types.StringType.get())
+    );
+    TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+    try {
+      Table table = catalog.buildTable(tableIdent, schema).create();
+      Assert.assertFalse("Must not have default partition spec",
+          
hmsTableParameters().containsKey(TableProperties.DEFAULT_PARTITION_SPEC));
+
+      table.updateSpec().addField(bucket("data", 16)).commit();
+      Assert.assertEquals(PartitionSpecParser.toJson(table.spec()),
+          hmsTableParameters().get(TableProperties.DEFAULT_PARTITION_SPEC));
+    } finally {
+      catalog.dropTable(tableIdent);
+    }
+  }
+
+  @Test
+  public void testSetCurrentSchema() throws Exception {
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get(), "unique ID"),
+        required(2, "data", Types.StringType.get())
+    );
+    TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+    try {
+      Table table = catalog.buildTable(tableIdent, schema).create();
+
+      Assert.assertEquals(SchemaParser.toJson(table.schema()),
+          hmsTableParameters().get(TableProperties.CURRENT_SCHEMA));
+
+      // add many new fields to make the schema json string exceed the limit
+      UpdateSchema updateSchema = table.updateSchema();
+      for (int i = 0; i < 600; i++) {
+        updateSchema.addColumn("new_col_" + i, Types.StringType.get());
+      }
+      updateSchema.commit();
+
+      Assert.assertTrue(SchemaParser.toJson(table.schema()).length() > 32672);

Review Comment:
   Line 597 and line 607 both call: SchemaParser.toJson(table.schema()), isn't 
it?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -433,6 +439,38 @@ void setSnapshotSummary(Map<String, String> parameters, 
Snapshot currentSnapshot
     }
   }
 
+  private void setSchema(TableMetadata metadata, Map<String, String> 
parameters) {
+    parameters.remove(TableProperties.CURRENT_SCHEMA);
+    if (metadata.schema() != null) {
+      String schema = SchemaParser.toJson(metadata.schema());
+      setField(parameters, TableProperties.CURRENT_SCHEMA, schema);
+    }
+  }
+
+  private void setPartitionSpec(TableMetadata metadata, Map<String, String> 
parameters) {
+    parameters.remove(TableProperties.DEFAULT_PARTITION_SPEC);
+    if (metadata.spec() != null && metadata.spec().isPartitioned()) {
+      String spec = PartitionSpecParser.toJson(metadata.spec());
+      setField(parameters, TableProperties.DEFAULT_PARTITION_SPEC, spec);
+    }
+  }
+
+  private void setSortOrder(TableMetadata metadata, Map<String, String> 
parameters) {
+    parameters.remove(TableProperties.DEFAULT_SORT_ORDER);
+    if (metadata.sortOrder() != null && metadata.sortOrder().isSorted()) {
+      String sortOrder = SortOrderParser.toJson(metadata.sortOrder());
+      setField(parameters, TableProperties.DEFAULT_SORT_ORDER, sortOrder);
+    }
+  }
+
+  private void setField(Map<String, String> parameters, String key, String 
value) {

Review Comment:
   Make sense, but could we at least take care of the ones in this PR?  ( 
schema, partition spec, sort order)?  We can have a follow up for the other 
ones not touched by this PR.
   
   Just didn't want to leave it in a state where we are wasting CPU cycle (JSON 
serialization) needlessly if the user turns off this feature.  As this is done 
in the critical commit block, unlike the original serialization which happens 
before.  The other HMS table properties to me are also less CPU intensive as 
they are just getting a field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4546: Hive: Expose default partition spec and sort order in HMS.

Reply via email to