pvary commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r522867441



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -80,6 +81,10 @@ public void 
preCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable)
         
Preconditions.checkArgument(catalogProperties.getProperty(InputFormatConfig.PARTITION_SPEC)
 == null,
             "Iceberg table already created - can not use provided partition 
specification");
 
+        Schema hmsSchema = HiveSchemaUtil.schema(hmsTable.getSd().getCols());
+        Preconditions.checkArgument(HiveSchemaUtil.compatible(hmsSchema, 
icebergTable.schema()),
+            "Iceberg table already created - with different specification");

Review comment:
       The Hive API messes up with us. Here we always receive the Hive column 
definition which was generated based on the SerDe.getObjectInspector. The 
objectInspector generation either uses the columns provided in the query or 
columns generated based on the Iceberg table schema. We can try to find a way 
to massage through the API the information whether the source of the schema is 
an Iceberg table or provided in the query, but in the medium-long term I think 
we would like to decouple the Iceberg schema and the Hive schema anyway.
   
   Just a few examples where decoupling could be useful:
   - Iceberg UUID column - we might want to map to STRING / VARCHAR / CHAR or 
even BINARY
   - Iceberg Time - we might want to find a way to read these columns to some 
of the Hive types
   - We might want to create an Hive table which only contains a few column 
from the underlying Iceberg table
   
   So instead of providing a throw-away temporary solution I decided to create 
the first implementation which checks the compatibility of the 2 schemas. The 
current implementation only allows exact matches, but it could be easily 
changed to allow differences if the ObjectInspectors are improved to handle the 
conversions.
   
   Does this makes sense?
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to