Alsom the reason the spark-streaming-kafka is not included in the spark
assembly is that we do not want dependencies of external systems like kafka
(which itself probably has a complex dependency tree) to cause conflict
with the core spark's functionality and stability.

TD


On Sun, Jul 13, 2014 at 5:48 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> In case you still have issues with duplicate files in uber jar, here is a
> reference sbt file with assembly plugin that deals with duplicates
>
>
> https://github.com/databricks/training/blob/sparkSummit2014/streaming/scala/build.sbt
>
>
> On Fri, Jul 11, 2014 at 10:06 AM, Bill Jay <bill.jaypeter...@gmail.com>
> wrote:
>
>> You may try to use this one:
>>
>> https://github.com/sbt/sbt-assembly
>>
>> I had an issue of duplicate files in the uber jar file. But I think this
>> library will assemble dependencies into a single jar file.
>>
>> Bill
>>
>>
>> On Fri, Jul 11, 2014 at 1:34 AM, Dilip <dilip_ram...@hotmail.com> wrote:
>>
>>>  A simple
>>>     sbt assembly
>>> is not working. Is there any other way to include particular jars with
>>> assembly command?
>>>
>>> Regards,
>>> Dilip
>>>
>>> On Friday 11 July 2014 12:45 PM, Bill Jay wrote:
>>>
>>> I have met similar issues. The reason is probably because in Spark
>>> assembly, spark-streaming-kafka is not included. Currently, I am using
>>> Maven to generate a shaded package with all the dependencies. You may try
>>> to use sbt assembly to include the dependencies in your jar file.
>>>
>>>  Bill
>>>
>>>
>>> On Thu, Jul 10, 2014 at 11:48 PM, Dilip <dilip_ram...@hotmail.com>
>>> wrote:
>>>
>>>>  Hi Akhil,
>>>>
>>>> Can you please guide me through this? Because the code I am running
>>>> already has this in it:
>>>> [java]
>>>>
>>>>         SparkContext sc = new SparkContext();
>>>>
>>>> sc.addJar("/usr/local/spark/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.1.0-SNAPSHOT.jar");
>>>>
>>>>
>>>> Is there something I am missing?
>>>>
>>>> Thanks,
>>>> Dilip
>>>>
>>>>
>>>> On Friday 11 July 2014 12:02 PM, Akhil Das wrote:
>>>>
>>>>  Easiest fix would be adding the kafka jars to the SparkContext while
>>>> creating it.
>>>>
>>>>  Thanks
>>>> Best Regards
>>>>
>>>>
>>>> On Fri, Jul 11, 2014 at 4:39 AM, Dilip <dilip_ram...@hotmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to run a program with spark streaming using Kafka on a
>>>>> stand alone system. These are my details:
>>>>>
>>>>> Spark 1.0.0 hadoop2
>>>>> Scala 2.10.3
>>>>>
>>>>> I am trying a simple program using my custom sbt project but this is
>>>>> the error I am getting:
>>>>>
>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> kafka/serializer/StringDecoder
>>>>>     at
>>>>> org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:55)
>>>>>     at
>>>>> org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:94)
>>>>>     at
>>>>> org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala)
>>>>>     at SimpleJavaApp.main(SimpleJavaApp.java:40)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>     at
>>>>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
>>>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
>>>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> kafka.serializer.StringDecoder
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>     ... 11 more
>>>>>
>>>>>
>>>>> here is my .sbt file:
>>>>>
>>>>> name := "Simple Project"
>>>>>
>>>>> version := "1.0"
>>>>>
>>>>> scalaVersion := "2.10.3"
>>>>>
>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>>>>>
>>>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>>> "1.0.0"
>>>>>
>>>>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0"
>>>>>
>>>>> libraryDependencies += "org.apache.spark" %% "spark-examples" % "1.0.0"
>>>>>
>>>>> libraryDependencies += "org.apache.spark" %
>>>>> "spark-streaming-kafka_2.10" % "1.0.0"
>>>>>
>>>>> libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.0"
>>>>>
>>>>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>>>>
>>>>> resolvers += "Maven Repository" at "http://central.maven.org/maven2/";
>>>>>
>>>>>
>>>>> sbt package was successful. I also tried sbt "++2.10.3 package" to
>>>>> build it for my scala version. Problem remains the same.
>>>>> Can anyone help me out here? Ive been stuck on this for quite some
>>>>> time now.
>>>>>
>>>>> Thank You,
>>>>> Dilip
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to