rdblue commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r511128474
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -89,22 +94,39 @@ public void
preCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable)
}
// If the table does not exist collect data for table creation
- String schemaString =
catalogProperties.getProperty(InputFormatConfig.TABLE_SCHEMA);
- Preconditions.checkNotNull(schemaString, "Please provide a table schema");
- // Just check if it is parsable, and later use for partition specification
parsing
- Schema schema = SchemaParser.fromJson(schemaString);
-
- String specString =
catalogProperties.getProperty(InputFormatConfig.PARTITION_SPEC);
- if (specString != null) {
- // Just check if it is parsable
- PartitionSpecParser.fromJson(schema, specString);
+ // - InputFormatConfig.TABLE_SCHEMA, InputFormatConfig.PARTITION_SPEC
takes precedence so the user can override the
+ // Iceberg schema and specification generated by the code
+ // - Partitioned Hive tables are converted to non-partitioned Hive tables
and the Iceberg partition specification
+ // is generated automatically based on the provided columns. If the
MetaStore table contains partitioning
+ // information then:
+ // - Merging the normal and partitioned columns for the table we are
creating
+ // - Removing partition columns for the table we are creating
+ // - Creating Iceberg partitioning specification using the partition
columns
+
+ Schema schema = schema(catalogProperties, hmsTable);
+ PartitionSpec spec = spec(schema, catalogProperties, hmsTable);
+
+ catalogProperties.put(InputFormatConfig.TABLE_SCHEMA,
SchemaParser.toJson(schema));
+ catalogProperties.put(InputFormatConfig.PARTITION_SPEC,
PartitionSpecParser.toJson(spec));
+
+ // Merging partition columns to the normal columns, since Hive table reads
are working only on non-partitioned
+ // tables
+ if (hmsTable.getPartitionKeys() != null &&
!hmsTable.getPartitionKeys().isEmpty()) {
+ hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
+ hmsTable.setPartitionKeysIsSet(false);
}
// Allow purging table data if the table is created now and not set
otherwise
if (hmsTable.getParameters().get(InputFormatConfig.EXTERNAL_TABLE_PURGE)
== null) {
hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE,
"TRUE");
}
+ // If the table is not managed by Hive catalog then the location should be
set
+ if (!Catalogs.hiveCatalog(conf)) {
+ Preconditions.checkArgument(hmsTable.getSd() != null &&
hmsTable.getSd().getLocation() != null,
+ "Table location not set");
Review comment:
I think it would be safer to detect the Hadoop catalog, right? There
could be other catalogs that are not Hive but also don't require a location.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]