Hi Tobias, Regarding my comment on closure serialization:
I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order for it to work. The SparkREPL works differently. It uses some dark magic to send the working session to the workers. -kr, Gerard. On Wed, May 21, 2014 at 2:47 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > Hi Tobias, > > I was curious about this issue and tried to run your example on my local > Mesos. I was able to reproduce your issue using your current config: > > [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task > 1.0:4 failed 4 times (most recent failure: Exception failure: > java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2) > org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times > (most recent failure: Exception failure: java.lang.ClassNotFoundException: > spark.SparkExamplesMinimal$$anonfun$2) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) > > Creating a simple jar from the job and providing it through the > configuration seems to solve it: > > val conf = new SparkConf() > .setMaster("mesos://<my_ip>:5050/") > * > .setJars(Seq("/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar"))* > .setAppName("SparkExamplesMinimal") > > Resulting in: > 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1) > 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at > SparkExamplesMinimal.scala:50) finished in 1.120 s > 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at > SparkExamplesMinimal.scala:50, took 1.177091435 s > count: 1000000 > > Why the closure serialization does not work with Mesos is beyond my > current knowledge. > Would be great to hear from the experts (cross-posting to dev for that) > > -kr, Gerard. > > > > > > > > > > > > > > On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer <t...@preferred.jp>wrote: > >> Hi, >> >> I have set up a cluster with Mesos (backed by Zookeeper) with three >> master and three slave instances. I set up Spark (git HEAD) for use >> with Mesos according to this manual: >> http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html >> >> Using the spark-shell, I can connect to this cluster and do simple RDD >> operations, but the same code in a Scala class and executed via sbt >> run-main works only partially. (That is, count() works, count() after >> flatMap() does not.) >> >> Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91 >> The file SparkExamplesScript.scala, when pasted into spark-shell, >> outputs the correct count() for the parallelized list comprehension, >> as well as for the flatMapped RDD. >> >> The file SparkExamplesMinimal.scala contains exactly the same code, >> and also the MASTER configuration and the Spark Executor are the same. >> However, while the count() for the parallelized list is displayed >> correctly, I receive the following error when asking for the count() >> of the flatMapped RDD: >> >> ----------------- >> >> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1 >> (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which >> has no missing parents >> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing >> tasks from Stage 1 (FlatMappedRDD[1] at flatMap at >> SparkExamplesMinimal.scala:34) >> 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set >> 1.0 with 8 tasks >> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0 >> as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1 >> (PROCESS_LOCAL) >> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0 >> as 1779147 bytes in 37 ms >> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0) >> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to >> java.lang.ClassNotFoundException >> java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2 >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) >> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) >> at >> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61) >> at >> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141) >> at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> ----------------- >> >> Can anyone explain to me where this comes from or how I might further >> track the problem down? >> >> Thanks, >> Tobias >> > >