pvary commented on a change in pull request #1612:
URL: https://github.com/apache/iceberg/pull/1612#discussion_r530190583
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -56,10 +61,28 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
} else if (serDeProperties.get(InputFormatConfig.TABLE_SCHEMA) != null) {
tableSchema = SchemaParser.fromJson((String)
serDeProperties.get(InputFormatConfig.TABLE_SCHEMA));
} else {
- try {
- tableSchema = Catalogs.loadTable(configuration,
serDeProperties).schema();
- } catch (NoSuchTableException nte) {
- throw new SerDeException("Please provide an existing table or a valid
schema", nte);
+ if (Catalogs.hiveCatalog(configuration)) {
Review comment:
HiveTableOperations converts the table schema to Hive columns /
StorageDescriptor when any change is committed to the table. This means that
the Iceberg schema and the Hive schema is always synchronized.
Since the above synchronization, I think it is better to use the "cached"
schema instead of loading the table again and again. This might change when we
clean up Timestamps / UUIDs since the mapping is not 1-on-1 there, but I would
leave something for that new PR too 😄
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]