Vikram Bohra created GOBBLIN-1845:
-------------------------------------

             Summary: Java parallel stream usage causes class loader conflict 
when run with spark
                 Key: GOBBLIN-1845
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1845
             Project: Apache Gobblin
          Issue Type: Task
            Reporter: Vikram Bohra


DatasetsFinderFilteringDecorator uses parallel stream on datasets to filter 
them on predicates. When this code runs in spark, system class loader gets used 
to pickup hive jar instead of the current conext class loader which leads to 
ClassNotFound issues 

stacktrace 
{code:java}
Caused by: 
MetaException(message:org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
class not found)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1494)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)
        at 
org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:100)
        at 
org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:106)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to