[jira] [Created] (GOBBLIN-1845) Java parallel stream usage causes class loader conflict when run with spark

Vikram Bohra (Jira) Fri, 16 Jun 2023 22:19:01 -0700

Vikram Bohra created GOBBLIN-1845:
-------------------------------------

             Summary: Java parallel stream usage causes class loader conflict 
when run with spark
                 Key: GOBBLIN-1845
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1845
             Project: Apache Gobblin
          Issue Type: Task
            Reporter: Vikram Bohra



DatasetsFinderFilteringDecorator uses parallel stream on datasets to filter 
them on predicates. When this code runs in spark, system class loader gets used 
to pickup hive jar instead of the current conext class loader which leads to 
ClassNotFound issues 

stacktrace 
{code:java}
Caused by: 
MetaException(message:org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
class not found)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1494)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)
        at 
org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:100)
        at 
org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:106)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (GOBBLIN-1845) Java parallel stream usage causes class loader conflict when run with spark

Reply via email to