Hi David, 

Just package with maven and deploy everthing into one jar. You don´t need to do 
it like this…  Use Maven for example. And check if your cluster already has 
this libraries loaded. If you are using CDH for example you can just import the 
classes because they already are in the path from your JVM . 
> On 15/02/2016, at 10:35, David Kennedy <dar...@gmail.com> wrote:
> 
> I have no problems when submitting the task using spark-submit.  The --jars 
> option with the list of jars required is successful and I see in the output 
> the jars being added:
> 
> 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR 
> file:/usr/lib/spark/extras/lib/spark-streaming-kafka.jar at 
> http://192.168.10.4:33820/jars/spark-streaming-kafka.jar 
> <http://192.168.10.4:33820/jars/spark-streaming-kafka.jar> with timestamp 
> 1455102864058
> 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR 
> file:/opt/kafka/libs/scala-library-2.10.1.jar at 
> http://192.168.10.4:33820/jars/scala-library-2.10.1.jar 
> <http://192.168.10.4:33820/jars/scala-library-2.10.1.jar> with timestamp 
> 1455102864077
> 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR 
> file:/opt/kafka/libs/kafka_2.10-0.8.1.1.jar at 
> http://192.168.10.4:33820/jars/kafka_2.10-0.8.1.1.jar 
> <http://192.168.10.4:33820/jars/kafka_2.10-0.8.1.1.jar> with timestamp 
> 1455102864085
> 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR 
> file:/opt/kafka/libs/metrics-core-2.2.0.jar at 
> http://192.168.10.4:33820/jars/metrics-core-2.2.0.jar 
> <http://192.168.10.4:33820/jars/metrics-core-2.2.0.jar> with timestamp 
> 1455102864086
> 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR 
> file:/usr/share/java/mysql.jar at http://192.168.10.4:33820/jars/mysql.jar 
> <http://192.168.10.4:33820/jars/mysql.jar> with timestamp 1455102864090
> 
> But when I try to programmatically create a context in python (I want to set 
> up some tests) I don't see this and I end up with class not found errors.
> 
> Trying to cover all bases even though I suspect that it's redundant when 
> running local I've tried:
> 
> spark_conf = SparkConf()
> spark_conf.setMaster('local[4]')
> spark_conf.set('spark.executor.extraLibraryPath',
>                '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,'
>                '/opt/kafka/libs/scala-library-2.10.1.jar,'
>                '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,'
>                '/opt/kafka/libs/metrics-core-2.2.0.jar,'
>                '/usr/share/java/mysql.jar')
> spark_conf.set('spark.executor.extraClassPath',
>                '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,'
>                '/opt/kafka/libs/scala-library-2.10.1.jar,'
>                '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,'
>                '/opt/kafka/libs/metrics-core-2.2.0.jar,'
>                '/usr/share/java/mysql.jar')
> spark_conf.set('spark.driver.extraClassPath',
>                '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,'
>                '/opt/kafka/libs/scala-library-2.10.1.jar,'
>                '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,'
>                '/opt/kafka/libs/metrics-core-2.2.0.jar,'
>                '/usr/share/java/mysql.jar')
> spark_conf.set('spark.driver.extraLibraryPath',
>                '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,'
>                '/opt/kafka/libs/scala-library-2.10.1.jar,'
>                '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,'
>                '/opt/kafka/libs/metrics-core-2.2.0.jar,'
>                '/usr/share/java/mysql.jar')
> self.spark_context = SparkContext(conf=spark_conf)
> But still I get the same failure to find the same class:
> 
> Py4JJavaError: An error occurred while calling o30.loadClass.
> : java.lang.ClassNotFoundException: 
> org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper
> 
> The class is certainly in the spark_streaming_kafka.jar and is present in the 
> filesystem at that location.
> 
> I'm under the impression that were I using java I'd be able to use the 
> addJars method on the conf to take care of this but there doesn't appear to 
> be anything that corresponds for python.
> 
> Hacking about I found that adding:
> 
> 
> spark_conf.set('spark.jars',
>                '/usr/lib/spark/extras/lib/spark-streaming-kafka.jar,'
>                '/opt/kafka/libs/scala-library-2.10.1.jar,'
>                '/opt/kafka/libs/kafka_2.10-0.8.1.1.jar,'
>                '/opt/kafka/libs/metrics-core-2.2.0.jar,'
>                '/usr/share/java/mysql.jar')
> got the logging to admit to adding the jars to the http server (just as for 
> the spark submit output above) but leaving the other config options in place 
> or removing them the class is still not found.
> 
> Is this not possible in python?
> 
> Incidentally, I have tried SPARK_CLASSPATH (getting the message that it's 
> deprecated and ignored anyway) and I cannot find anything else to try.
> 
> Can anybody help?
> 
> David K.
> 

Reply via email to