Alsom the reason the spark-streaming-kafka is not included in the spark assembly is that we do not want dependencies of external systems like kafka (which itself probably has a complex dependency tree) to cause conflict with the core spark's functionality and stability.
TD On Sun, Jul 13, 2014 at 5:48 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > In case you still have issues with duplicate files in uber jar, here is a > reference sbt file with assembly plugin that deals with duplicates > > > https://github.com/databricks/training/blob/sparkSummit2014/streaming/scala/build.sbt > > > On Fri, Jul 11, 2014 at 10:06 AM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> You may try to use this one: >> >> https://github.com/sbt/sbt-assembly >> >> I had an issue of duplicate files in the uber jar file. But I think this >> library will assemble dependencies into a single jar file. >> >> Bill >> >> >> On Fri, Jul 11, 2014 at 1:34 AM, Dilip <dilip_ram...@hotmail.com> wrote: >> >>> A simple >>> sbt assembly >>> is not working. Is there any other way to include particular jars with >>> assembly command? >>> >>> Regards, >>> Dilip >>> >>> On Friday 11 July 2014 12:45 PM, Bill Jay wrote: >>> >>> I have met similar issues. The reason is probably because in Spark >>> assembly, spark-streaming-kafka is not included. Currently, I am using >>> Maven to generate a shaded package with all the dependencies. You may try >>> to use sbt assembly to include the dependencies in your jar file. >>> >>> Bill >>> >>> >>> On Thu, Jul 10, 2014 at 11:48 PM, Dilip <dilip_ram...@hotmail.com> >>> wrote: >>> >>>> Hi Akhil, >>>> >>>> Can you please guide me through this? Because the code I am running >>>> already has this in it: >>>> [java] >>>> >>>> SparkContext sc = new SparkContext(); >>>> >>>> sc.addJar("/usr/local/spark/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.1.0-SNAPSHOT.jar"); >>>> >>>> >>>> Is there something I am missing? >>>> >>>> Thanks, >>>> Dilip >>>> >>>> >>>> On Friday 11 July 2014 12:02 PM, Akhil Das wrote: >>>> >>>> Easiest fix would be adding the kafka jars to the SparkContext while >>>> creating it. >>>> >>>> Thanks >>>> Best Regards >>>> >>>> >>>> On Fri, Jul 11, 2014 at 4:39 AM, Dilip <dilip_ram...@hotmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to run a program with spark streaming using Kafka on a >>>>> stand alone system. These are my details: >>>>> >>>>> Spark 1.0.0 hadoop2 >>>>> Scala 2.10.3 >>>>> >>>>> I am trying a simple program using my custom sbt project but this is >>>>> the error I am getting: >>>>> >>>>> Exception in thread "main" java.lang.NoClassDefFoundError: >>>>> kafka/serializer/StringDecoder >>>>> at >>>>> org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:55) >>>>> at >>>>> org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:94) >>>>> at >>>>> org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala) >>>>> at SimpleJavaApp.main(SimpleJavaApp.java:40) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) >>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) >>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>> Caused by: java.lang.ClassNotFoundException: >>>>> kafka.serializer.StringDecoder >>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>> ... 11 more >>>>> >>>>> >>>>> here is my .sbt file: >>>>> >>>>> name := "Simple Project" >>>>> >>>>> version := "1.0" >>>>> >>>>> scalaVersion := "2.10.3" >>>>> >>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0" >>>>> >>>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % >>>>> "1.0.0" >>>>> >>>>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0" >>>>> >>>>> libraryDependencies += "org.apache.spark" %% "spark-examples" % "1.0.0" >>>>> >>>>> libraryDependencies += "org.apache.spark" % >>>>> "spark-streaming-kafka_2.10" % "1.0.0" >>>>> >>>>> libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.0" >>>>> >>>>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/" >>>>> >>>>> resolvers += "Maven Repository" at "http://central.maven.org/maven2/" >>>>> >>>>> >>>>> sbt package was successful. I also tried sbt "++2.10.3 package" to >>>>> build it for my scala version. Problem remains the same. >>>>> Can anyone help me out here? Ive been stuck on this for quite some >>>>> time now. >>>>> >>>>> Thank You, >>>>> Dilip >>>>> >>>> >>>> >>>> >>> >>> >> >