I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver. It is connecting a a standalone cluster on my local box which has two worker running.
This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have added the following entry to my $SPARK_HOME/conf/spark-default.conf: spark.executor.extraClassPath /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes up just fine. As do the two workers with the following command: Worker 1, port 8081: radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8081 --cores 2 Worker 2, port 8082 radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8082 --cores 2 When I execute the Driver connecting the the master: sbt app/run -Dspark.master=spark://radtech.io:7077 It starts up, but when the executors are launched they do not include the entry in the spark.executor.extraClassPath: 15/05/22 17:35:26 INFO Worker: Asked to launch executor app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO ExecutorRunner: Launch command: "java" "-cp" "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler" "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" "app-20150522173526-0000" "--worker-url" "akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker" which will then cause the executor to fail with a ClassNotFoundException, which I would expect: [WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:344) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I also notice that some of the entires on the executor classpath are duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6 standalone cluster just to ensure I had nothing from testing in the way. I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will pick up the jar and append it fine. Any suggestions on what is going on here? Seems to just ignore whatever I have in the spark.executor.extraClassPath. Is there a different way to do this? TIA. -Todd