edgarRd opened a new pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598
When running in Hive / Tez, I'm hitting the following error with current
`master` branch:
```
Vertex failed, vertexName=Map 1, vertexId=vertex_1613777207443_50159_1_00,
diagnostics=[Vertex vertex_1613777207443_50159_1_00 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: my_table initializer failed,
vertex=vertex_1613777207443_50159_1_00 [Map 1], java.lang.NoClassDefFoundError:
org/apache/hadoop/hive/metastore/HiveMetaHook
at
org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:99)
at
org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:68)
at
org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:72)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.metastore.HiveMetaHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 18 more
]
```
After some debugging, I noticed that the missing class
`org.apache.hadoop.hive.metastore.HiveMetaHook` is part of the `hive-metastore`
package and is introduced by
https://github.com/apache/iceberg/commit/d1510340eaff68d88a2e8194d58e7e493af02bcc#diff-9f974af5a35965b695ad7b3a1fa0d806d4748e890dabd015c538326def44d289R99.
The issue seems to be that by importing class `HiveIcebergStorageHandler` in
`IcebergInputFormat` now in the Tez side it needs to resolve all other Hive
package dependencies within that class.
Since `HiveIcebergStorageHandler` is introduced only for the `table`
function, this PR attempts to avoid that dependency and extract functions out
of `HiveIcebergStorageHandler` that do deserialization and do not depend on
Hive, removing the requirement to have them in the classpath in the Tez
execution side.
PTAL @pvary @massdosage
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]