spark.executor.extraClassPath - Values not picked up by executors

Todd Nist Fri, 22 May 2015 15:16:06 -0700

I'm using the spark-cassandra-connector from DataStax in a spark streaming
job launched from my own driver.  It is connecting a a standalone cluster
on my local box which has two worker running.


This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT.  I have
added the following entry to my $SPARK_HOME/conf/spark-default.conf:

spark.executor.extraClassPath
/projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar


When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes up
just fine.  As do the two workers with the following command:

Worker 1, port 8081:

radtech:spark $ ./bin/spark-class
org.apache.spark.deploy.worker.Worker spark://radtech.io:7077
--webui-port 8081 --cores 2

Worker 2, port 8082

radtech:spark $ ./bin/spark-class
org.apache.spark.deploy.worker.Worker spark://radtech.io:7077
--webui-port 8082 --cores 2

When I execute the Driver connecting the the master:

sbt app/run -Dspark.master=spark://radtech.io:7077

It starts up, but when the executors are launched they do not include the
entry in the spark.executor.extraClassPath:

15/05/22 17:35:26 INFO Worker: Asked to launch executor
app-20150522173526-0000/0 for KillrWeatherApp$15/05/22 17:35:26 INFO
ExecutorRunner: Launch command: "java" "-cp"
"/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar"
"-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url" 
"akka.tcp://sparkDriver@192.168.1.3:55932/user/CoarseGrainedScheduler"
"--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2"
"--app-id" "app-20150522173526-0000" "--worker-url"
"akka.tcp://sparkWorker@192.168.1.3:55923/user/Worker"



which will then cause the executor to fail with a ClassNotFoundException,
which I would expect:

[WARN] [2015-05-22 17:38:18,035]
[org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage
2.0 (TID 23, 192.168.1.3): java.lang.ClassNotFoundException:
com.datastax.spark.connector.rdd.partitioner.CassandraPartition
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:344)
    at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
    at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I also notice that some of the entires on the executor classpath are
duplicated?  This is a newly installed spark-1.3.1-bin-hadoop2.6
 standalone cluster just to ensure I had nothing from testing in the way.

I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will
pick up the jar and append it fine.

Any suggestions on what is going on here?  Seems to just ignore whatever I
have in the spark.executor.extraClassPath.  Is there a different way to do
this?

TIA.

-Todd

spark.executor.extraClassPath - Values not picked up by executors

Reply via email to