Here's a workaround: - Download and put this jar <http://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.7.7/ avro-mapred-1.7.7-hadoop2.jar> in the SPARK_CLASSPATH in all workers - Make sure that jar is present in the same path in all workers.
Thanks Best Regards On Thu, Mar 5, 2015 at 10:27 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > I am trying to read RDD avro, transform and write. > I am able to run it locally fine but when i run onto cluster, i see issues > with Avro. > > > export SPARK_HOME=/home/dvasthimal/spark/spark-1.0.2-bin-2.4.1 > export SPARK_YARN_USER_ENV="CLASSPATH=/apache/hadoop/conf" > export HADOOP_CONF_DIR=/apache/hadoop/conf > export YARN_CONF_DIR=/apache/hadoop/conf > export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.0.2-hadoop2.4.1.jar > export SPARK_LIBRARY_PATH=/apache/hadoop/lib/native > export SPARK_YARN_USER_ENV="CLASSPATH=/apache/hadoop/conf" > export SPARK_YARN_USER_ENV="CLASSPATH=/apache/hadoop/conf" > export > > SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-company-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/home/dvasthimal/spark/avro-mapred-1.7.7-hadoop2.jar:/home/dvasthimal/spark/avro-1.7.7.jar > export SPARK_LIBRARY_PATH="/apache/hadoop/lib/native" > export YARN_CONF_DIR=/apache/hadoop/conf/ > > cd $SPARK_HOME > > ./bin/spark-submit --master yarn-cluster --jars > > /home/dvasthimal/spark/avro-mapred-1.7.7-hadoop2.jar,/home/dvasthimal/spark/avro-1.7.7.jar > --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores > 1 --queue hdmi-spark --class com.company.ep.poc.spark.reporting.SparkApp > /home/dvasthimal/spark/spark_reporting-1.0-SNAPSHOT.jar > startDate=2015-02-16 endDate=2015-02-16 > epoutputdirectory=/user/dvasthimal/epdatasets_small/exptsession > subcommand=successevents > outputdir=/user/dvasthimal/epdatasets/successdetail > > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 15/03/04 03:20:29 INFO client.ConfiguredRMFailoverProxyProvider: Failing > over to rm2 > 15/03/04 03:20:30 INFO yarn.Client: Got Cluster metric info from > ApplicationsManager (ASM), number of NodeManagers: 2221 > 15/03/04 03:20:30 INFO yarn.Client: Queue info ... queueName: hdmi-spark, > queueCurrentCapacity: 0.7162806, queueMaxCapacity: 0.08, > queueApplicationCount = 7, queueChildQueueCount = 0 > 15/03/04 03:20:30 INFO yarn.Client: Max mem capabililty of a single > resource in this cluster 16384 > 15/03/04 03:20:30 INFO yarn.Client: Preparing Local resources > 15/03/04 03:20:30 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 15/03/04 03:20:30 WARN hdfs.BlockReaderLocal: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > > > 15/03/04 03:20:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token > 7780745 for dvasthimal on 10.115.206.112:8020 > 15/03/04 03:20:46 INFO yarn.Client: Uploading > file:/home/dvasthimal/spark/spark_reporting-1.0-SNAPSHOT.jar to hdfs:// > > apollo-phx-nn.company.com:8020/user/dvasthimal/.sparkStaging/application_1425075571333_61948/spark_reporting-1.0-SNAPSHOT.jar > 15/03/04 03:20:47 INFO yarn.Client: Uploading > > file:/home/dvasthimal/spark/spark-1.0.2-bin-2.4.1/lib/spark-assembly-1.0.2-hadoop2.4.1.jar > to hdfs:// > > apollo-phx-nn.company.com:8020/user/dvasthimal/.sparkStaging/application_1425075571333_61948/spark-assembly-1.0.2-hadoop2.4.1.jar > 15/03/04 03:20:52 INFO yarn.Client: Uploading > file:/home/dvasthimal/spark/avro-mapred-1.7.7-hadoop2.jar to hdfs:// > > apollo-phx-nn.company.com:8020/user/dvasthimal/.sparkStaging/application_1425075571333_61948/avro-mapred-1.7.7-hadoop2.jar > 15/03/04 03:20:52 INFO yarn.Client: Uploading > file:/home/dvasthimal/spark/avro-1.7.7.jar to hdfs:// > > apollo-phx-nn.company.com:8020/user/dvasthimal/.sparkStaging/application_1425075571333_61948/avro-1.7.7.jar > 15/03/04 03:20:54 INFO yarn.Client: Setting up the launch environment > 15/03/04 03:20:54 INFO yarn.Client: Setting up container launch context > 15/03/04 03:20:54 INFO yarn.Client: Command for starting the Spark > ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m, > -Djava.io.tmpdir=$PWD/tmp, > -Dspark.app.name=\"com.company.ep.poc.spark.reporting.SparkApp\", > -Dlog4j.configuration=log4j-spark-container.properties, > org.apache.spark.deploy.yarn.ApplicationMaster, --class, > com.company.ep.poc.spark.reporting.SparkApp, --jar , > file:/home/dvasthimal/spark/spark_reporting-1.0-SNAPSHOT.jar, --args > 'startDate=2015-02-16' --args 'endDate=2015-02-16' --args > 'epoutputdirectory=/user/dvasthimal/epdatasets_small/exptsession' --args > 'subcommand=successevents' --args > 'outputdir=/user/dvasthimal/epdatasets/successdetail' , --executor-memory, > 2048, --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, > <LOG_DIR>/stderr) > 15/03/04 03:20:54 INFO yarn.Client: Submitting application to ASM > 15/03/04 03:20:54 INFO impl.YarnClientImpl: Submitted application > application_1425075571333_61948 > 15/03/04 03:20:56 INFO yarn.Client: Application report from ASM: > application identifier: application_1425075571333_61948 > appId: 61948 > clientToAMToken: null > appDiagnostics: > appMasterHost: N/A > appQueue: hdmi-spark > appMasterRpcPort: -1 > appStartTime: 1425464454263 > yarnAppState: ACCEPTED > distributedFinalState: UNDEFINED > appTrackingUrl: > > https://apollo-phx-rm-2.company.com:50030/proxy/application_1425075571333_61948/ > appUser: dvasthimal > 15/03/04 03:21:18 INFO yarn.Client: Application report from ASM: > application identifier: application_1425075571333_61948 > appId: 61948 > clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: } > appDiagnostics: > appMasterHost: phxaishdc9dn0169.phx.company.com > appQueue: hdmi-spark > appMasterRpcPort: 0 > appStartTime: 1425464454263 > yarnAppState: RUNNING > distributedFinalState: UNDEFINED > appTrackingUrl: > > https://apollo-phx-rm-2.company.com:50030/proxy/application_1425075571333_61948/ > appUser: dvasthimal > …. > …. > 15/03/04 03:21:22 INFO yarn.Client: Application report from ASM: > application identifier: application_1425075571333_61948 > appId: 61948 > clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: } > appDiagnostics: > appMasterHost: phxaishdc9dn0169.phx.company.com > appQueue: hdmi-spark > appMasterRpcPort: 0 > appStartTime: 1425464454263 > yarnAppState: FINISHED > distributedFinalState: FAILED > appTrackingUrl: > > https://apollo-phx-rm-2.company.com:50030/proxy/application_1425075571333_61948/A > appUser: dvasthimal > > > > AM failed with following exception > > /apache/hadoop/bin/yarn logs -applicationId application_1425075571333_61948 > 15/03/04 03:21:22 INFO NewHadoopRDD: Input split: hdfs:// > > apollo-phx-nn.company.com:8020/user/dvasthimal/epdatasets_small/exptsession/2015/02/16/part-r-00000.avro:0+13890 > 15/03/04 03:21:22 ERROR Executor: Exception in task ID 3 > java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected > at > > org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:111) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > > 1) Having figured out the error the fix would be to put the right version > of avro libs into AM JVM classpath. Hence i included --jars > > /home/dvasthimal/spark/avro-mapred-1.7.7-hadoop2.jar,/home/dvasthimal/spark/avro-1.7.7.jar > in spark-submit command. However i still see the same exception. > 2) I tried to include these libs in SPARK_CLASSPATH. However i see the same > exception. > > > -- > Deepak >