[GitHub] [iceberg] rdblue commented on a change in pull request #1612: Hive: Using Hive schema to create tables and partition specification

GitBox Fri, 23 Oct 2020 13:00:30 -0700


rdblue commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r511128474




##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -89,22 +94,39 @@ public void 
preCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable)
     }
 
     // If the table does not exist collect data for table creation
-    String schemaString = 
catalogProperties.getProperty(InputFormatConfig.TABLE_SCHEMA);
-    Preconditions.checkNotNull(schemaString, "Please provide a table schema");
-    // Just check if it is parsable, and later use for partition specification 
parsing
-    Schema schema = SchemaParser.fromJson(schemaString);
-
-    String specString = 
catalogProperties.getProperty(InputFormatConfig.PARTITION_SPEC);
-    if (specString != null) {
-      // Just check if it is parsable
-      PartitionSpecParser.fromJson(schema, specString);
+    // - InputFormatConfig.TABLE_SCHEMA, InputFormatConfig.PARTITION_SPEC 
takes precedence so the user can override the
+    // Iceberg schema and specification generated by the code
+    // - Partitioned Hive tables are converted to non-partitioned Hive tables 
and the Iceberg partition specification
+    // is generated automatically based on the provided columns. If the 
MetaStore table contains partitioning
+    // information then:
+    //     - Merging the normal and partitioned columns for the table we are 
creating
+    //     - Removing partition columns for the table we are creating
+    //     - Creating Iceberg partitioning specification using the partition 
columns
+
+    Schema schema = schema(catalogProperties, hmsTable);
+    PartitionSpec spec = spec(schema, catalogProperties, hmsTable);
+
+    catalogProperties.put(InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(schema));
+    catalogProperties.put(InputFormatConfig.PARTITION_SPEC, 
PartitionSpecParser.toJson(spec));
+
+    // Merging partition columns to the normal columns, since Hive table reads 
are working only on non-partitioned
+    // tables
+    if (hmsTable.getPartitionKeys() != null && 
!hmsTable.getPartitionKeys().isEmpty()) {
+      hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
+      hmsTable.setPartitionKeysIsSet(false);
     }
 
     // Allow purging table data if the table is created now and not set 
otherwise
     if (hmsTable.getParameters().get(InputFormatConfig.EXTERNAL_TABLE_PURGE) 
== null) {
       hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"TRUE");
     }
 
+    // If the table is not managed by Hive catalog then the location should be 
set
+    if (!Catalogs.hiveCatalog(conf)) {
+      Preconditions.checkArgument(hmsTable.getSd() != null && 
hmsTable.getSd().getLocation() != null,
+          "Table location not set");

Review comment:
       I think it would be safer to detect the Hadoop catalog, right? There 
could be other catalogs that are not Hive but also don't require a location.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1612: Hive: Using Hive schema to create tables and partition specification

Reply via email to