[
https://issues.apache.org/jira/browse/KYLIN-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446343#comment-17446343
]
xbchao commented on KYLIN-5126:
-------------------------------
The aws engineer advice:
I need to explain to you that kylin is not an application supported by EMR. It
cannot be ensured that the kylin installed by yourself will be compatible and
not conflicting with the dependency package in EMR. Troubleshooting such kylin
service related problems itself also surpasses AWS. Suppor's support category.
In addition, different versions of kylin and installation methods may cause
different effects, and it is currently not clear how you installed kylin or how
to introduce dependencies into the code. Based on the above considerations, we
can only provide some directions for you to try based on the benchmark of best
effort.
I guess that your code must generate a Hive client to connect to the Hive
metastore. However, "AWSGlueDataCatalogHiveClientFactory" is not found and it
fails and an error occurs.
=======================================================
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException
at
com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3113)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3148) at
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1244) at
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:183) at
org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:175) at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) at
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:185)
... 95 more
=======================================================
Checking the resource configuration of your EMR cluster j-ID, I can confirm
that the hive metastore and Spark metastore used by the cluster use AWS Glue
metastore. In such an environment, the default configuration of spark and hive
will be changed, and the related extraClassPath will also be linked to the
related AWSGlueDataCatalogHiveClient series of dependency packages, which can
be found in the description in spark-env.
=======================================================
$ vi /etc/spark/conf/spark-defaults.conf
spark.master yarn spark.driver.extraClassPath
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.driver.extraLibraryPath
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath
/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
spark.executor.extraLibraryPath
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
=======================================================
>From the classpath searched when the kylin.engine.spark.application task is
>running, it seems that there is no relevant
>aws-glue-datacatalog-spark-client.jar under the above
>/usr/share/aws/hmclient/lib/. Therefore, the possible cause of the error is:
>Although the dependency was copied when the image was made, the classpath when
>the task was submitted did not point to the relevant directory, resulting in a
>ClassNotFoundException error and the AWSGlueDataCatalogHiveClient client could
>not be created correctly.
To solve this problem, it is recommended that you check the classpath of the
dependency package referenced when the kylin.engine.spark.application task is
running, and include the extraClassPath that was originally present in the EMR
spark-defaults.conf configuration by default. , So it should be able to help
correctly find the relevant dependency packages to create
AWSGlueDataCatalogHiveClient.
> Build kylin 4.0, spark has not been able to submit to the yarn cluster
> ----------------------------------------------------------------------
>
> Key: KYLIN-5126
> URL: https://issues.apache.org/jira/browse/KYLIN-5126
> Project: Kylin
> Issue Type: Bug
> Reporter: xbchao
> Priority: Major
>
> When I built kylin 4.0, spark could not be submitted to the yarn cluster. The
> version used was apache-kylin-4.0.0-bin-spark2, which was deployed in the aws
> emr server.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)