For standalone-cluster mode, there's a scala.MatchError. Also it looks like the --jars configurations are not passed to the driver/worker node? (also copying from file:/<path> doesn't seem correct, shouldn't it copy form http://<master>/<path> ?)
... 14/06/17 04:15:30 INFO Worker: Asked to launch driver driver-20140617041530-0000 14/06/17 04:15:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/17 04:15:30 INFO DriverRunner: Copying user jar file:/x/home/jianshuang/tmp/rtgraph.jar to /x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/work/driver-20140617041530-0000/rtgraph.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/06/17 04:15:30 INFO DriverRunner: Launch Command: "/usr/java/jdk1.7.0_40/bin/java" "-cp" "/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/work/driver-20140617041530-0000/rtgraph.jar:::/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/conf:/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/lib/spark-assembly-1.0.0-hadoop2.4.0.jar:/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/lib/datanucleus-api-jdo-3.2.1.jar:/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/lib/datanucleus-core-3.2.2.jar:/x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0/lib/datanucleus-rdbms-3.2.1.jar:/etc/hadoop/conf:/usr/lib/hadoop-yarn/conf" "-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp:// sparkwor...@lvshdc5dn0321.lvs.paypal.com:41987/user/Worker" "com.paypal.rtgraph.demo.MapReduceWriter" 14/06/17 04:15:32 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:317) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/06/17 04:15:32 INFO Worker: Starting Spark worker lvshdc5dn0321.lvs.paypal.com:41987 with 32 cores, 125.0 GB RAM 14/06/17 04:15:32 INFO Worker: Spark home: /x/home/jianshuang/spark/spark-1.0.0-hadoop2.4.0 14/06/17 04:15:32 INFO WorkerWebUI: Started WorkerWebUI at http://lvshdc5dn0321.lvs.paypal.com:8081 14/06/17 04:15:32 INFO Worker: Connecting to master spark://lvshdc5en0015.lvs.paypal.com:7077... 14/06/17 04:15:32 ERROR Worker: Worker registration failed: Attempted to re-register worker at same address: akka.tcp:// sparkwor...@lvshdc5dn0321.lvs.paypal.com:41987 Is that a bug? Jianshi On Tue, Jun 17, 2014 at 5:41 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Hi, > > I've stuck using either yarn-client or standalone-client mode. Either will > stuck when I submit jobs, the last messages it printed were: > > ... > 14/06/17 02:37:17 INFO spark.SparkContext: Added JAR > file:/x/home/jianshuang/tmp/lib/commons-vfs2.jar at > http://10.196.195.25:56377/jars/commons-vfs2.jar with timestamp > 1402997837065 > 14/06/17 02:37:17 INFO spark.SparkContext: Added JAR > file:/x/home/jianshuang/tmp/rtgraph.jar at > http://10.196.195.25:56377/jars/rtgraph.jar with timestamp 1402997837065 > 14/06/17 02:37:17 INFO cluster.YarnClusterScheduler: Created > YarnClusterScheduler > 14/06/17 02:37:17 INFO yarn.ApplicationMaster$$anon$1: Adding shutdown > hook for context org.apache.spark.SparkContext@6655cf60 > > I can use yarn-cluster to run my app but it's not very convenient to > monitor the progress. > > Standalone-cluster mode doesn't work, it reports file not found error: > > Driver successfully submitted as driver-20140617023956-0003 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20140617023956-0003 is ERROR > Exception from cluster was: java.io.FileNotFoundException: File > file:/x/home/jianshuang/tmp/rtgraph.jar does not exist > > > I'm using Spark 1.0.0 and my submit command looks like this: > > ~/spark/spark-1.0.0-hadoop2.4.0/bin/spark-submit --name 'rtgraph' > --class com.paypal.rtgraph.demo.MapReduceWriter --master spark:// > lvshdc5en0015.lvs.paypal.com:7077 --jars `find lib -type f | tr '\n' ','` > --executor-memory 20G --total-executor-cores 96 --deploy-mode cluster > rtgraph.jar > > List of jars I put in --jars option are: > > accumulo-core.jar > accumulo-fate.jar > accumulo-minicluster.jar > accumulo-trace.jar > accumulo-tracer.jar > chill_2.10-0.3.6.jar > commons-math.jar > commons-vfs2.jar > config-1.2.1.jar > gson.jar > guava.jar > joda-convert-1.2.jar > joda-time-2.3.jar > kryo-2.21.jar > libthrift.jar > quasiquotes_2.10-2.0.0-M8.jar > scala-async_2.10-0.9.1.jar > scala-library-2.10.4.jar > scala-reflect-2.10.4.jar > > > Anyone has hint what went wrong? Really confused. > > > Cheers, > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/