Can you give use your SBT project? Minus the source codes if you don't wish to expose them.
TD On Mon, Mar 16, 2015 at 12:54 PM, Kelly, Jonathan <jonat...@amazon.com> wrote: > Yes, I do have the following dependencies marked as "provided": > > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.3.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.0" % > "provided" > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0" % > "provided" > > However, spark-streaming-kinesis-asl has a compile time dependency on > spark-streaming, so I think that causes it and its dependencies to be > pulled into the assembly. I expected that simply excluding spark-streaming > in the spark-streaming-kinesis-asl dependency would solve this problem, but > it does not. That is, this doesn't work either: > > libraryDependencies += "org.apache.spark" %% > "spark-streaming-kinesis-asl" % "1.3.0" exclude("org.apache.spark", > "spark-streaming") > > As I mentioned originally, the following solved some but not all > conflicts: > > libraryDependencies += "org.apache.spark" %% > "spark-streaming-kinesis-asl" % "1.3.0" excludeAll( > ExclusionRule(organization = "org.apache.hadoop"), > ExclusionRule(organization = "org.apache.spark", name = > "spark-streaming") > ) > > (Note that ExclusionRule(organization = "org.apache.spark") without the > "name" attribute does not work because that apparently causes it to exclude > even spark-streaming-kinesis-asl.) > > > Jonathan Kelly > > Elastic MapReduce - SDE > > Port 99 (SEA35) 08.220.C2 > > From: Tathagata Das <t...@databricks.com> > Date: Monday, March 16, 2015 at 12:45 PM > To: Jonathan Kelly <jonat...@amazon.com> > Cc: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: problems with spark-streaming-kinesis-asl and "sbt assembly" > ("different file contents found") > > If you are creating an assembly, make sure spark-streaming is marked as > provided. spark-streaming is already part of the spark installation so will > be present at run time. That might solve some of these, may be!? > > TD > > On Mon, Mar 16, 2015 at 11:30 AM, Kelly, Jonathan <jonat...@amazon.com> > wrote: > >> I'm attempting to use the Spark Kinesis Connector, so I've added the >> following dependency in my build.sbt: >> >> libraryDependencies += "org.apache.spark" %% >> "spark-streaming-kinesis-asl" % "1.3.0" >> >> My app works fine with "sbt run", but I can't seem to get "sbt >> assembly" to work without failing with "different file contents found" >> errors due to different versions of various packages getting pulled in to >> the assembly. This only occurs when I've added spark-streaming-kinesis-asl >> as a dependency. "sbt assembly" works fine otherwise. >> >> Here are the conflicts that I see: >> >> com.esotericsoftware.kryo:kryo:2.21 >> com.esotericsoftware.minlog:minlog:1.2 >> >> com.google.guava:guava:15.0 >> org.apache.spark:spark-network-common_2.10:1.3.0 >> >> (Note: The conflict is with javac.sh; why is this even getting >> included?) >> org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0 >> org.apache.spark:spark-streaming_2.10:1.3.0 >> org.apache.spark:spark-core_2.10:1.3.0 >> org.apache.spark:spark-network-common_2.10:1.3.0 >> org.apache.spark:spark-network-shuffle_2.10:1.3.0 >> >> (Note: I'm actually using my own custom-built version of Spark-1.3.0 >> where I've upgraded to v1.9.24 of the AWS Java SDK, but that has nothing to >> do with all of these conflicts, as I upgraded the dependency *because* I >> was getting all of these conflicts with the Spark 1.3.0 artifacts from the >> central repo.) >> com.amazonaws:aws-java-sdk-s3:1.9.24 >> net.java.dev.jets3t:jets3t:0.9.3 >> >> commons-collections:commons-collections:3.2.1 >> commons-beanutils-commons-beanutils:1.7.0 >> commons-beanutils:commons-beanutils-core:1.8.0 >> >> commons-logging:commons-logging:1.1.3 >> org.slf4j:jcl-over-slf4j:1.7.10 >> >> (Note: The conflict is with a few package-info.class files, which seems >> really silly.) >> org.apache.hadoop:hadoop-yarn-common:2.4.0 >> org.apache.hadoop:hadoop-yarn-api:2.4.0 >> >> (Note: The conflict is with org/apache/spark/unused/UnusedStubClass.class, >> which seems even more silly.) >> org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0 >> org.apache.spark:spark-streaming_2.10:1.3.0 >> org.apache.spark:spark-core_2.10:1.3.0 >> org.apache.spark:spark-network-common_2.10:1.3.0 >> org.spark-project.spark:unused:1.0.0 (?!?!?!) >> org.apache.spark:spark-network-shuffle_2.10:1.3.0 >> >> I can get rid of some of the conflicts by using excludeAll() to exclude >> artifacts with organization = "org.apache.hadoop" or organization = >> "org.apache.spark" and name = "spark-streaming", and I might be able to >> resolve a few other conflicts this way, but the bottom line is that this is >> way more complicated than it should be, so either something is really >> broken or I'm just doing something wrong. >> >> Many of these don't even make sense to me. For example, the very first >> conflict is between classes in com.esotericsoftware.kryo:kryo:2.21 and in >> com.esotericsoftware.minlog:minlog:1.2, but the former *depends* upon the >> latter, so ??? It seems wrong to me that one package would contain >> different versions of the same classes that are included in one of its >> dependencies. I guess it doesn't make too much difference though if I >> could only get my assembly to include/exclude the right packages. I of >> course don't want any of the spark or hadoop dependencies included (other >> than spark-streaming-kinesis-asl itself), but I want all of >> spark-streaming-kinesis-asl's dependencies included (such as the AWS Java >> SDK and its dependencies). That doesn't seem to be possible without what I >> imagine will become an unruly and fragile exclusion list though. >> >> >> Thanks, >> >> Jonathan >> > >