pvary commented on pull request #1505:
URL: https://github.com/apache/iceberg/pull/1505#issuecomment-698477361
> > > The changes look good. Just a quick sanity check: what happens in Hive
if the serde or storage handler class is missing? Will this break metastore
operations on the table?
> > > I remember that a missing class used to break any query that loaded
the table, including DROP TABLE. So you couldn't even clean up problems.
> >
> >
> > That was my fear as well, but did not have time to test them out yet. It
might even cause issues for users who are not using the Hive tables, just want
to use HiveCatalog to store the table versions.
>
> I'm pretty sure that if you try to perform any operations on tables with
the StorageHandler set on them, and you don't have that class on the classpath,
it will fail with an error. I've been in exactly the situation you describe
where I couldn't issue a DROP TABLE for a table that was created by a test
storage handler that no longer existed.
The tests caught the exact same problem:
```
org.apache.iceberg.spark.sql.TestAlterTable > testSetTableProperties[2]
FAILED
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage
handler.org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
at
org.apache.hadoop.hive.ql.metadata.Table.getStorageHandler(Table.java:297)
at
org.apache.spark.sql.hive.client.HiveClientImpl.convertHiveTableToCatalogTable(HiveClientImpl.scala:465)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:424)
at scala.Option.map(Option.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:424)
```
My first ideas:
1. Check that the classes are on the classpath before creating the table? -
This would only fix the problem if the creator/reader of the table is both have
the same things on the classpath.
2. Move the HiveIcebergStorageHandler / HiveIcebergSerDe to their own
package and try to shave of the dependencies - Seems like a hard solution and
we will eventually fall back to Reflection
3. Go full reflection: Create a reflection based SerDe, StorageHandler which
checks the classpath for the actual classes and if successful wraps them?
@rdblue, @massdosage: What do you think about the 3rd option?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]