pvary commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r522867441



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -80,6 +81,10 @@ public void 
preCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable)
         
Preconditions.checkArgument(catalogProperties.getProperty(InputFormatConfig.PARTITION_SPEC)
 == null,
             "Iceberg table already created - can not use provided partition 
specification");
 
+        Schema hmsSchema = HiveSchemaUtil.schema(hmsTable.getSd().getCols());
+        Preconditions.checkArgument(HiveSchemaUtil.compatible(hmsSchema, 
icebergTable.schema()),
+            "Iceberg table already created - with different specification");

Review comment:
       The Hive API messes up with us. Here we always receive the Hive column 
definition which was generated based on the SerDe.getObjectInspector. The SerDe 
either uses the columns provided or the Iceberg table schema. We can try to 
find a way to massage through the API the information whether the source of the 
schema is an Iceberg table or provided in command line, but in the medium-long 
term I think we would like to decouple the Iceberg schema and the Hive schema 
anyway.
   
   Just a few examples where decoupling could be useful:
   - Iceberg UUID column - we might want to map to STRING / VARCHAR / CHAR or 
even BINARY
   - Iceberg Time - we might want to find a way to at least read these columns 
to some of the Hive types
   - We might want to create an Hive table which only contains a few column 
from the underlying Iceberg table
   
   So instead of providing a throw-away temporary solution I decided to create 
the first implementation which checks the compatibility of the 2 schemas. The 
current implementation only allows exact matches, but it could be easily 
changed to allow differences if the ObjectInspectors are improved to handle the 
conversions.
   
   Does this makes sense?
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to