[GitHub] [hudi] WilliamWhispell opened a new issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

GitBox Fri, 03 Jul 2020 08:53:01 -0700


WilliamWhispell opened a new issue #1789:
URL: https://github.com/apache/hudi/issues/1789



   **Describe the problem you faced**
   
   I'm trying to run a hudi write inside a glue job. My understanding is that 
Glue 1.0 uses Spark 2.4.3 and Hadoop 2.8.5.
   
   I've added hudi-spark-bundle_2.11-0.5.3.jar and spark-avro_2.11-2.4.3.jar as 
dependant jars on the Glue job.
   
   However, often the job fails with:
   
   class threw exception: java.lang.NoSuchMethodError: 
org.eclipse.jetty.util.thread.QueuedThreadPool.<init>(III)V
        at 
io.javalin.core.util.JettyServerUtil.defaultServer(JettyServerUtil.kt:43)
        at io.javalin.Javalin.<init>(Javalin.java:94)
        at io.javalin.Javalin.create(Javalin.java:107)
        at 
org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
        at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
        at 
org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
        at 
org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:69)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:83)
        at 
org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:137)
        at 
org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:124)
        at 
org.apache.hudi.client.HoodieWriteClient.<init>(HoodieWriteClient.java:120)
        at 
org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at GlueApp$.main(script_2020-07-03-14-45-41.scala:84)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.amazonaws.services.glue.util.GlueExceptionWrapper$$anonfun$1.apply$mcV$sp(GlueExceptionWrapper.scala:35)
        at com.amazonaws.
   
   This makes me think I have some type of dependency issue.
   
   Reading over the release notes 
https://hudi.apache.org/releases.html#migration-guide-for-this-release-2 - the 
only requirement I could find for spark was: IMPORTANT This version requires 
your runtime spark version to be upgraded to 2.4+.
   
   So I would expect this to work on Spark 2.4.3 but I'm not sure if the two 
jars I added are all that is needed.
   
   Here is what my code looks like (Scala 2.11):
   
   object GlueApp {
     def main(sysArgs: Array[String]) {
       val sc: SparkContext = new SparkContext()
       val glueContext: GlueContext = new GlueContext(sc)
       val spark: SparkSession = glueContext.getSparkSession
       
       // @params: [JOB_NAME]
       val args = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_NAME", 
"input_file", "schema_file", "target_table", "target_s3_path", 
"save_mode").toArray)
       Job.init(args("JOB_NAME"), glueContext, args.asJava)
   
   ...
   
   
df.write.format("org.apache.hudi").options(hudiOptions).option("hoodie.consistency.check.enabled",
 "true").mode(saveMode).save(s3SaveLocation)
   
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   
   * Spark version : 2.4.3
   
   * Hive version : ?
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] WilliamWhispell opened a new issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

Reply via email to