hudi-bot opened a new issue, #15770: URL: https://github.com/apache/hudi/issues/15770
I am trying to use hudi-spark3.3. bundle in EMR cluster using OSS spark. Used: [https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz] for spark. {code:java} ./bin/spark-shell --driver-memory 4g --executor-memory 6g --master yarn --deploy-mode client --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --jars /home/hadoop/hudi-spark3.3-bundle_2.12-0.13.0-rc2.jar --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" 23/02/08 21:20:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222) at org.apache.spark.SparkContext.<init>(SparkContext.scala:585) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106) ... 55 elided Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 67 more <console>:14: error: not found: value spark import spark.implicits._ ^ <console>:14: error: not found: value spark import spark.sql ^ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.3.0 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_352) Type in expressions to have them evaluated. Type :help for more information. scala> {code} From SO, found that we can bypass by disabling timeline server in yarn. [https://stackoverflow.com/questions/74451254/caused-by-java-lang-classnotfoundexception-org-apache-hadoop-shaded-javax-ws-r] had to set yarn.timeline-service.enabled = false in /etc/hadoop/conf/yarn-site.xml After this, I don't see any issues. I tired all versions and its the same behavior. hudi-0.12.0, 0.12.2 and 0.13.0 rc2 ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-5732 - Type: Bug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
