szehon-ho commented on code in PR #4546:
URL: https://github.com/apache/iceberg/pull/4546#discussion_r857879761
##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##########
@@ -562,6 +562,60 @@ public void testSetSnapshotSummary() throws Exception {
Assert.assertEquals("The snapshot summary must not be in parameters due to
the size limit", 0, parameters.size());
}
+ @Test
+ public void testSetDefaultPartitionSpec() throws Exception {
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get(), "unique ID"),
+ required(2, "data", Types.StringType.get())
+ );
+ TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+ try {
+ Table table = catalog.buildTable(tableIdent, schema).create();
+ Assert.assertFalse("Must not have default partition spec",
+
hmsTableParameters().containsKey(TableProperties.DEFAULT_PARTITION_SPEC));
+
+ table.updateSpec().addField(bucket("data", 16)).commit();
+ Assert.assertEquals(PartitionSpecParser.toJson(table.spec()),
+ hmsTableParameters().get(TableProperties.DEFAULT_PARTITION_SPEC));
+ } finally {
+ catalog.dropTable(tableIdent);
+ }
+ }
+
+ @Test
+ public void testSetCurrentSchema() throws Exception {
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get(), "unique ID"),
+ required(2, "data", Types.StringType.get())
+ );
+ TableIdentifier tableIdent = TableIdentifier.of(DB_NAME, "tbl");
+
+ try {
+ Table table = catalog.buildTable(tableIdent, schema).create();
+
+ Assert.assertEquals(SchemaParser.toJson(table.schema()),
+ hmsTableParameters().get(TableProperties.CURRENT_SCHEMA));
+
+ // add many new fields to make the schema json string exceed the limit
+ UpdateSchema updateSchema = table.updateSchema();
+ for (int i = 0; i < 600; i++) {
+ updateSchema.addColumn("new_col_" + i, Types.StringType.get());
+ }
+ updateSchema.commit();
+
+ Assert.assertTrue(SchemaParser.toJson(table.schema()).length() > 32672);
Review Comment:
It's just for test, but let's make SchemaParser.toJson(table.schema()) a
variable?
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -433,6 +439,38 @@ void setSnapshotSummary(Map<String, String> parameters,
Snapshot currentSnapshot
}
}
+ private void setSchema(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.CURRENT_SCHEMA);
+ if (metadata.schema() != null) {
+ String schema = SchemaParser.toJson(metadata.schema());
+ setField(parameters, TableProperties.CURRENT_SCHEMA, schema);
+ }
+ }
+
+ private void setPartitionSpec(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.DEFAULT_PARTITION_SPEC);
+ if (metadata.spec() != null && metadata.spec().isPartitioned()) {
+ String spec = PartitionSpecParser.toJson(metadata.spec());
+ setField(parameters, TableProperties.DEFAULT_PARTITION_SPEC, spec);
+ }
+ }
+
+ private void setSortOrder(TableMetadata metadata, Map<String, String>
parameters) {
+ parameters.remove(TableProperties.DEFAULT_SORT_ORDER);
+ if (metadata.sortOrder() != null && metadata.sortOrder().isSorted()) {
+ String sortOrder = SortOrderParser.toJson(metadata.sortOrder());
+ setField(parameters, TableProperties.DEFAULT_SORT_ORDER, sortOrder);
+ }
+ }
+
+ private void setField(Map<String, String> parameters, String key, String
value) {
Review Comment:
One performance suggestion, if the user sets to 0 (disable this feature), we
can skip the serialization for performance.
Maybe , easiest, we can we add some boolean function like
exposeViaTableProperties() that checks if value is 0, and use it in all the
methods?
```
if (exposeInHmsProperties() && metadata.sortOrder() != null &&
metadata.sortOrder().isSorted()) {
```
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -52,8 +52,11 @@
import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
Review Comment:
Two other suggestion for this class: can we add in comment of
"HIVE_TABLE_PROPERTY_MAX_SIZE" , one more sentence to let user know how to turn
off feature?
```
// set to 0 to not expose Iceberg metadata in HMS properties
```
And also, a precondition in HiveTableOperations constructor to check if
value is non-negative.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]