edgarRd opened a new pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598


   When running in Hive / Tez, I'm hitting the following error with current 
`master` branch:
   
   ```
   Vertex failed, vertexName=Map 1, vertexId=vertex_1613777207443_50159_1_00, 
diagnostics=[Vertex vertex_1613777207443_50159_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: my_table initializer failed, 
vertex=vertex_1613777207443_50159_1_00 [Map 1], java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/metastore/HiveMetaHook
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:99)
        at 
org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:68)
        at 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:72)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.metastore.HiveMetaHook
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 18 more
   ]
   ```
   
   After some debugging, I noticed that the missing class 
`org.apache.hadoop.hive.metastore.HiveMetaHook` is part of the `hive-metastore` 
package and is introduced by 
https://github.com/apache/iceberg/commit/d1510340eaff68d88a2e8194d58e7e493af02bcc#diff-9f974af5a35965b695ad7b3a1fa0d806d4748e890dabd015c538326def44d289R99.
 The issue seems to be that by importing class `HiveIcebergStorageHandler` in 
`IcebergInputFormat` now in the Tez side it needs to resolve all other Hive 
package dependencies within that class.
   
   Since `HiveIcebergStorageHandler` is introduced only for the `table` 
function, this PR attempts to avoid that dependency and extract functions out 
of `HiveIcebergStorageHandler` that do deserialization and do not depend on 
Hive, removing the requirement to have them in the classpath in the Tez 
execution side.
   
   PTAL @pvary @massdosage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to