[ 
https://issues.apache.org/jira/browse/KYLIN-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446343#comment-17446343
 ] 

xbchao commented on KYLIN-5126:
-------------------------------

The aws engineer advice:
I need to explain to you that kylin is not an application supported by EMR. It 
cannot be ensured that the kylin installed by yourself will be compatible and 
not conflicting with the dependency package in EMR. Troubleshooting such kylin 
service related problems itself also surpasses AWS. Suppor's support category. 
In addition, different versions of kylin and installation methods may cause 
different effects, and it is currently not clear how you installed kylin or how 
to introduce dependencies into the code. Based on the above considerations, we 
can only provide some directions for you to try based on the benchmark of best 
effort.
I guess that your code must generate a Hive client to connect to the Hive 
metastore. However, "AWSGlueDataCatalogHiveClientFactory" is not found and it 
fails and an error occurs.
======================================================= 
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException 
at 
com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16)
 at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3113) 
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3148) at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1244) at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:183) at 
org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:175) at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:185)
 
... 95 more 
=======================================================
Checking the resource configuration of your EMR cluster j-ID, I can confirm 
that the hive metastore and Spark metastore used by the cluster use AWS Glue 
metastore. In such an environment, the default configuration of spark and hive 
will be changed, and the related extraClassPath will also be linked to the 
related AWSGlueDataCatalogHiveClient series of dependency packages, which can 
be found in the description in spark-env.
======================================================= 
$ vi /etc/spark/conf/spark-defaults.conf 
spark.master yarn spark.driver.extraClassPath 
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
 spark.driver.extraLibraryPath 
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native 
spark.executor.extraClassPath 
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
 spark.executor.extraLibraryPath 
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native 
======================================================= 
>From the classpath searched when the kylin.engine.spark.application task is 
>running, it seems that there is no relevant 
>aws-glue-datacatalog-spark-client.jar under the above 
>/usr/share/aws/hmclient/lib/. Therefore, the possible cause of the error is: 
>Although the dependency was copied when the image was made, the classpath when 
>the task was submitted did not point to the relevant directory, resulting in a 
>ClassNotFoundException error and the AWSGlueDataCatalogHiveClient client could 
>not be created correctly.
To solve this problem, it is recommended that you check the classpath of the 
dependency package referenced when the kylin.engine.spark.application task is 
running, and include the extraClassPath that was originally present in the EMR 
spark-defaults.conf configuration by default. , So it should be able to help 
correctly find the relevant dependency packages to create 
AWSGlueDataCatalogHiveClient.

> Build kylin 4.0, spark has not been able to submit to the yarn cluster
> ----------------------------------------------------------------------
>
>                 Key: KYLIN-5126
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5126
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: xbchao
>            Priority: Major
>
> When I built kylin 4.0, spark could not be submitted to the yarn cluster. The 
> version used was apache-kylin-4.0.0-bin-spark2, which was deployed in the aws 
> emr server.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to