michetti commented on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658373503
Hey @GrigorievNick, I saw the issue was closed but if I understood correctly, the link you posted is about AWS Athena and how it can work with Hudi tables registered in the AWS Glue catalog, while the issue is about getting Hudi to work on AWS Glue Jobs (AWS serverless Spark service). Not sure I missed something? I'm was having the same error as @WilliamWhispell, and from what I could find, it seems to be related to a version mismatch between the org.eclipse.jetty jars required by Hudi and the AWS Glue Jobs runtime. For example, Timeline service depends on Javalin 2.8.0, which in turn requires Jetty version 9.4.15.v20190215: - https://github.com/apache/hudi/blob/release-0.5.3/hudi-timeline-service/pom.xml#L111 - https://github.com/tipsy/javalin/blob/javalin-2.8.0/pom.xml#L43 While Spark 2.4.3 (this is the version Glue Jobs 1.0 runtime uses) depends on Jetty version 9.3.24.v20180605: - https://github.com/apache/spark/blob/v2.4.3/pom.xml#L137 I got it working by shadowing _org.eclipse.jetty._ in the spark-bundle, by adding the following [here](https://github.com/apache/hudi/blob/release-0.5.3/packaging/hudi-spark-bundle/pom.xml#L99): ```xml <relocation> <pattern>org.eclipse.jetty.</pattern> <shadedPattern>org.apache.hudi.org.eclipse.jetty.</shadedPattern> </relocation> ``` @WilliamWhispell, I'm not sure there is a better way, but with Hudi 0.5.3 on AWS Glue Jobs 1.0, I needed the following jars: - httpclient-4.5.12.jar (due to [this](https://forums.aws.amazon.com/thread.jspa?messageID=930176) other error) - spark-avro_2.11-2.4.3.jar - hudi-spark-bundle_2.11-0.5.3.jar (your own, with the changes above) And remember that you also need to configure spark the way it is described in Hudi documentation: ```scala val sparkConf: SparkConf = new SparkConf(); sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); sparkConf.set("spark.sql.hive.convertMetastoreParquet", "false"); val sparkContext: SparkContext = new SparkContext(sparkConf) val glueContext: GlueContext = new GlueContext(sparkContext) val spark: SparkSession = glueContext.getSparkSession ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
