Vikram Bohra created GOBBLIN-1845:
-------------------------------------
Summary: Java parallel stream usage causes class loader conflict
when run with spark
Key: GOBBLIN-1845
URL: https://issues.apache.org/jira/browse/GOBBLIN-1845
Project: Apache Gobblin
Issue Type: Task
Reporter: Vikram Bohra
DatasetsFinderFilteringDecorator uses parallel stream on datasets to filter
them on predicates. When this code runs in spark, system class loader gets used
to pickup hive jar instead of the current conext class loader which leads to
ClassNotFound issues
stacktrace
{code:java}
Caused by:
MetaException(message:org.apache.hadoop.hive.metastore.HiveMetaStoreClient
class not found)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1494)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)
at
org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:100)
at
org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:106)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)