[GitHub] [iceberg] pvary commented on a change in pull request #3912: Hive: Support 'identifier-field-ids' when creating table in hive

GitBox Tue, 25 Jan 2022 08:54:05 -0800


pvary commented on a change in pull request #3912:
URL: https://github.com/apache/iceberg/pull/3912#discussion_r791383079




##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -75,7 +84,43 @@ public static Schema convert(List<FieldSchema> fieldSchemas, 
boolean autoConvert
       typeInfos.add(TypeInfoUtils.getTypeInfoFromTypeString(col.getType()));
       comments.add(col.getComment());
     }
-    return HiveSchemaConverter.convert(names, typeInfos, comments, 
autoConvert);
+    Schema schema = HiveSchemaConverter.convert(names, typeInfos, comments, 
autoConvert);
+    return rebuildSchemaWithIdentifierFieldIds(schema, identifierFieldNames);

Review comment:
       Nit: one more optimization: do not rebuild the schema, if there is no 
identifer

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -75,7 +84,43 @@ public static Schema convert(List<FieldSchema> fieldSchemas, 
boolean autoConvert
       typeInfos.add(TypeInfoUtils.getTypeInfoFromTypeString(col.getType()));
       comments.add(col.getComment());
     }
-    return HiveSchemaConverter.convert(names, typeInfos, comments, 
autoConvert);
+    Schema schema = HiveSchemaConverter.convert(names, typeInfos, comments, 
autoConvert);
+    return rebuildSchemaWithIdentifierFieldIds(schema, identifierFieldNames);

Review comment:
       Maybe a quick return inside of the method 

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -54,7 +58,11 @@ private HiveSchemaUtil() {
    * @return An equivalent Iceberg Schema
    */
   public static Schema convert(List<FieldSchema> fieldSchemas) {
-    return convert(fieldSchemas, false);
+    return convert(fieldSchemas, false, Collections.emptySet());
+  }
+
+  public static Schema convert(List<FieldSchema> fieldSchemas, Set<String> 
identifierFieldNames) {

Review comment:
       Nit: add javadoc please 

##########
File path: 
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -112,8 +157,10 @@ public static Schema convert(List<String> names, 
List<TypeInfo> types, List<Stri
    *                    thrown.
    * @return The Iceberg schema
    */
-  public static Schema convert(List<String> names, List<TypeInfo> types, 
List<String> comments, boolean autoConvert) {
-    return HiveSchemaConverter.convert(names, types, comments, autoConvert);
+  public static Schema convert(List<String> names, List<TypeInfo> types, 
List<String> comments, boolean autoConvert,

Review comment:
       Fix javadoc 

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -49,6 +49,9 @@ private InputFormatConfig() {
   public static final String SERIALIZED_TABLE_PREFIX = 
"iceberg.mr.serialized.table.";
   public static final String TABLE_CATALOG_PREFIX = 
"iceberg.mr.table.catalog.";
   public static final String LOCALITY = "iceberg.mr.locality";
+  public static final String IDENTIFIER_FIELD_NAMES = 
"iceberg.identifier-field-names";
+  // Usually, column names containing ',' are not supported by hive, so we can 
use ',' as separator.

Review comment:
       AFAIK with the correct escaping, you can have anything in the column 
names. Could you please try that out?
   Thanks, Peter 

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -773,4 +773,52 @@ public void testDropHiveTableWithoutUnderlyingTable() 
throws IOException {
   private String 
getCurrentSnapshotForHiveCatalogTable(org.apache.iceberg.Table icebergTable) {
     return ((BaseMetastoreTableOperations) ((BaseTable) 
icebergTable).operations()).currentMetadataLocation();
   }
+
+  @Test
+  public void testCreateTableWithIdentifierIds() {
+    TableIdentifier tableIdentifier = TableIdentifier.of("default", 
"customers");
+    shell.executeStatement(

Review comment:
       You might have missed this comment earlier 

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -773,4 +773,52 @@ public void testDropHiveTableWithoutUnderlyingTable() 
throws IOException {
   private String 
getCurrentSnapshotForHiveCatalogTable(org.apache.iceberg.Table icebergTable) {
     return ((BaseMetastoreTableOperations) ((BaseTable) 
icebergTable).operations()).currentMetadataLocation();
   }
+
+  @Test
+  public void testCreateTableWithIdentifierIds() {
+    TableIdentifier tableIdentifier = TableIdentifier.of("default", 
"customers");
+    shell.executeStatement(
+        String.format("CREATE EXTERNAL TABLE %s ( " +
+                "  c1 INT, " +
+                "  c2 STRING, " +
+                "  c3 STRUCT<c4:STRING, c5:STRING> " +
+                ") " +
+                "PARTITIONED BY (c6 STRING) " +
+                "STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' %s " +
+                "TBLPROPERTIES (" +
+                "  '%s' = '%s'," +
+                "  '%s' = '%s'" +
+                ")",
+            tableIdentifier,
+            testTables.locationForCreateTableSQL(tableIdentifier),
+            InputFormatConfig.IDENTIFIER_FIELD_NAMES, "c1,c2",
+            InputFormatConfig.CATALOG_NAME, testTables.catalogName()));
+
+    org.apache.iceberg.Table table = testTables.loadTable(tableIdentifier);
+    Assert.assertEquals(ImmutableSet.of(1, 2), 
table.schema().identifierFieldIds());
+  }
+
+  @Test
+  public void testCreateTableWithIdentifierIdsError() {
+    TableIdentifier tableIdentifier = TableIdentifier.of("default", 
"customers");
+    Assert.assertThrows(
+        "Cannot add field c4 as an identifier field: must not in nested field",
+        IllegalArgumentException.class,
+        () -> shell.executeStatement(
+            String.format("CREATE EXTERNAL TABLE %s ( " +

Review comment:
       And this too 

##########
File path: mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##########
@@ -49,6 +49,9 @@ private InputFormatConfig() {
   public static final String SERIALIZED_TABLE_PREFIX = 
"iceberg.mr.serialized.table.";
   public static final String TABLE_CATALOG_PREFIX = 
"iceberg.mr.table.catalog.";
   public static final String LOCALITY = "iceberg.mr.locality";
+  public static final String IDENTIFIER_FIELD_NAMES = 
"iceberg.identifier-field-names";
+  // Usually, column names containing ',' are not supported by hive, so we can 
use ',' as separator.

Review comment:
       We work around this when pushing column names with another config which 
allows us to provide a separator too. This could default to `, `,  so it is 
rarely used, but allows the user to fix the schema, if needed 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on a change in pull request #3912: Hive: Support 'identifier-field-ids' when creating table in hive

Reply via email to