Can you give use your SBT project? Minus the source codes if you don't wish
to expose them.

TD

On Mon, Mar 16, 2015 at 12:54 PM, Kelly, Jonathan <jonat...@amazon.com>
wrote:

>   Yes, I do have the following dependencies marked as "provided":
>
>  libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.3.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0" %
> "provided"
>
>  However, spark-streaming-kinesis-asl has a compile time dependency on
> spark-streaming, so I think that causes it and its dependencies to be
> pulled into the assembly.  I expected that simply excluding spark-streaming
> in the spark-streaming-kinesis-asl dependency would solve this problem, but
> it does not.  That is, this doesn't work either:
>
>  libraryDependencies += "org.apache.spark" %%
> "spark-streaming-kinesis-asl" % "1.3.0" exclude("org.apache.spark",
> "spark-streaming")
>
>  As I mentioned originally, the following solved some but not all
> conflicts:
>
>  libraryDependencies += "org.apache.spark" %%
> "spark-streaming-kinesis-asl" % "1.3.0" excludeAll(
>   ExclusionRule(organization = "org.apache.hadoop"),
>   ExclusionRule(organization = "org.apache.spark", name =
> "spark-streaming")
> )
>
>  (Note that ExclusionRule(organization = "org.apache.spark") without the
> "name" attribute does not work because that apparently causes it to exclude
> even spark-streaming-kinesis-asl.)
>
>
>  Jonathan Kelly
>
> Elastic MapReduce - SDE
>
> Port 99 (SEA35) 08.220.C2
>
>   From: Tathagata Das <t...@databricks.com>
> Date: Monday, March 16, 2015 at 12:45 PM
> To: Jonathan Kelly <jonat...@amazon.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: problems with spark-streaming-kinesis-asl and "sbt assembly"
> ("different file contents found")
>
>   If you are creating an assembly, make sure spark-streaming is marked as
> provided. spark-streaming is already part of the spark installation so will
> be present at run time. That might solve some of these, may be!?
>
>  TD
>
> On Mon, Mar 16, 2015 at 11:30 AM, Kelly, Jonathan <jonat...@amazon.com>
> wrote:
>
>>  I'm attempting to use the Spark Kinesis Connector, so I've added the
>> following dependency in my build.sbt:
>>
>>  libraryDependencies += "org.apache.spark" %%
>> "spark-streaming-kinesis-asl" % "1.3.0"
>>
>>  My app works fine with "sbt run", but I can't seem to get "sbt
>> assembly" to work without failing with "different file contents found"
>> errors due to different versions of various packages getting pulled in to
>> the assembly.  This only occurs when I've added spark-streaming-kinesis-asl
>> as a dependency. "sbt assembly" works fine otherwise.
>>
>>  Here are the conflicts that I see:
>>
>>  com.esotericsoftware.kryo:kryo:2.21
>> com.esotericsoftware.minlog:minlog:1.2
>>
>>  com.google.guava:guava:15.0
>> org.apache.spark:spark-network-common_2.10:1.3.0
>>
>>  (Note: The conflict is with javac.sh; why is this even getting
>> included?)
>> org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
>> org.apache.spark:spark-streaming_2.10:1.3.0
>> org.apache.spark:spark-core_2.10:1.3.0
>> org.apache.spark:spark-network-common_2.10:1.3.0
>> org.apache.spark:spark-network-shuffle_2.10:1.3.0
>>
>>  (Note: I'm actually using my own custom-built version of Spark-1.3.0
>> where I've upgraded to v1.9.24 of the AWS Java SDK, but that has nothing to
>> do with all of these conflicts, as I upgraded the dependency *because* I
>> was getting all of these conflicts with the Spark 1.3.0 artifacts from the
>> central repo.)
>> com.amazonaws:aws-java-sdk-s3:1.9.24
>> net.java.dev.jets3t:jets3t:0.9.3
>>
>>  commons-collections:commons-collections:3.2.1
>> commons-beanutils-commons-beanutils:1.7.0
>> commons-beanutils:commons-beanutils-core:1.8.0
>>
>>  commons-logging:commons-logging:1.1.3
>> org.slf4j:jcl-over-slf4j:1.7.10
>>
>>  (Note: The conflict is with a few package-info.class files, which seems
>> really silly.)
>> org.apache.hadoop:hadoop-yarn-common:2.4.0
>> org.apache.hadoop:hadoop-yarn-api:2.4.0
>>
>>  (Note: The conflict is with org/apache/spark/unused/UnusedStubClass.class,
>> which seems even more silly.)
>> org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
>> org.apache.spark:spark-streaming_2.10:1.3.0
>> org.apache.spark:spark-core_2.10:1.3.0
>> org.apache.spark:spark-network-common_2.10:1.3.0
>> org.spark-project.spark:unused:1.0.0 (?!?!?!)
>> org.apache.spark:spark-network-shuffle_2.10:1.3.0
>>
>>  I can get rid of some of the conflicts by using excludeAll() to exclude
>> artifacts with organization = "org.apache.hadoop" or organization =
>> "org.apache.spark" and name = "spark-streaming", and I might be able to
>> resolve a few other conflicts this way, but the bottom line is that this is
>> way more complicated than it should be, so either something is really
>> broken or I'm just doing something wrong.
>>
>>  Many of these don't even make sense to me.  For example, the very first
>> conflict is between classes in com.esotericsoftware.kryo:kryo:2.21 and in
>> com.esotericsoftware.minlog:minlog:1.2, but the former *depends* upon the
>> latter, so ???  It seems wrong to me that one package would contain
>> different versions of the same classes that are included in one of its
>> dependencies.  I guess it doesn't make too much difference though if I
>> could only get my assembly to include/exclude the right packages.  I of
>> course don't want any of the spark or hadoop dependencies included (other
>> than spark-streaming-kinesis-asl itself), but I want all of
>> spark-streaming-kinesis-asl's dependencies included (such as the AWS Java
>> SDK and its dependencies).  That doesn't seem to be possible without what I
>> imagine will become an unruly and fragile exclusion list though.
>>
>>
>>  Thanks,
>>
>> Jonathan
>>
>
>

Reply via email to