pvary commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r522867441
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -80,6 +81,10 @@ public void
preCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable)
Preconditions.checkArgument(catalogProperties.getProperty(InputFormatConfig.PARTITION_SPEC)
== null,
"Iceberg table already created - can not use provided partition
specification");
+ Schema hmsSchema = HiveSchemaUtil.schema(hmsTable.getSd().getCols());
+ Preconditions.checkArgument(HiveSchemaUtil.compatible(hmsSchema,
icebergTable.schema()),
+ "Iceberg table already created - with different specification");
Review comment:
The Hive API messes up with us. Here we always receive the Hive column
definition which was generated based on the SerDe.getObjectInspector. The SerDe
either uses the columns provided or the Iceberg table schema. We can try to
find a way to massage through the API the information whether the source of the
schema is an Iceberg table or provided in command line, but in the medium-long
term I think we would like to decouple the Iceberg schema and the Hive schema
anyway.
Just a few examples where decoupling could be useful:
- Iceberg UUID column - we might want to map to STRING / VARCHAR / CHAR or
even BINARY
- Iceberg Time - we might want to find a way to at least read these columns
to some of the Hive types
- We might want to create an Hive table which only contains a few column
from the underlying Iceberg table
So instead of providing a throw-away temporary solution I decided to create
the first implementation which checks the compatibility of the 2 schemas. The
current implementation only allows exact matches, but it could be easily
changed to allow differences if the ObjectInspectors are improved to handle the
conversions.
Does this makes sense?
Thanks,
Peter
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]