michetti edited a comment on issue #1789:
URL: https://github.com/apache/hudi/issues/1789#issuecomment-658373503


   Hey @GrigorievNick, I saw the issue was closed but if I understood 
correctly, the link you posted is about AWS Athena and how it can work with 
Hudi tables registered in the AWS Glue catalog, while the issue is about 
getting Hudi to work on AWS Glue Jobs (AWS serverless Spark service). Not sure 
I missed something?
   
   I was having the same error as @WilliamWhispell, and from what I could find, 
it seems to be related to a version 
   mismatch between the org.eclipse.jetty jars required by Hudi and the AWS 
Glue Jobs runtime.
   
   For example, Timeline service depends on Javalin 2.8.0, which in turn 
requires Jetty version 9.4.15.v20190215:
   - 
https://github.com/apache/hudi/blob/release-0.5.3/hudi-timeline-service/pom.xml#L111
   - https://github.com/tipsy/javalin/blob/javalin-2.8.0/pom.xml#L43
   
   While Spark 2.4.3 (this is the version Glue Jobs 1.0 runtime uses) depends 
on Jetty version 9.3.24.v20180605:
   - https://github.com/apache/spark/blob/v2.4.3/pom.xml#L137
   
   I got it working by shading _org.eclipse.jetty._  in the spark-bundle, by 
adding the following 
[here](https://github.com/apache/hudi/blob/release-0.5.3/packaging/hudi-spark-bundle/pom.xml#L99):
   ```xml
   <relocation>
     <pattern>org.eclipse.jetty.</pattern>
     <shadedPattern>org.apache.hudi.org.eclipse.jetty.</shadedPattern>
   </relocation>
   ```
   
   @WilliamWhispell, I'm not sure there is a better way, but with Hudi 0.5.3 on 
AWS Glue Jobs 1.0, I needed the following jars:
   - httpclient-4.5.12.jar (due to 
[this](https://forums.aws.amazon.com/thread.jspa?messageID=930176) other error)
   - spark-avro_2.11-2.4.3.jar
   - hudi-spark-bundle_2.11-0.5.3.jar (your own, with the changes above)
   
   And remember that you also need to configure spark the way it is described 
in Hudi documentation:
   ```scala
   val sparkConf: SparkConf = new SparkConf();
   sparkConf.set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer");
   sparkConf.set("spark.sql.hive.convertMetastoreParquet", "false");
     
   val sparkContext: SparkContext = new SparkContext(sparkConf)
   val glueContext: GlueContext = new GlueContext(sparkContext)
   val spark: SparkSession = glueContext.getSparkSession
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to