Have you tried marking only spark-streaming-kinesis-asl as not provided,
and the rest as provided? Then you will not even need to add
kinesis-asl.jar in the spark-submit.

TD

On Tue, Apr 14, 2015 at 2:27 PM, Mike Trienis <mike.trie...@orcsol.com>
wrote:

> Richard,
>
> You response was very helpful and actually resolved my issue. In case
> others run into a similar issue,  I followed the procedure:
>
>    - Upgraded to spark 1.3.0
>    - Add all spark related libraries are "provided"
>    - Include spark transitive library dependencies
>
> where my build.sbt file
>
> libraryDependencies ++= {
>   Seq(
>         "org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
>         "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided",
>         "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" %
> "provided",
>         "joda-time" % "joda-time" % "2.2",
>         "org.joda" % "joda-convert" % "1.2",
>         "com.amazonaws" % "aws-java-sdk" % "1.8.3",
>         "com.amazonaws" % "amazon-kinesis-client" % "1.2.0")
>
> and submitting a spark job can done via
>
> sh ./spark-1.3.0-bin-cdh4/bin/spark-submit --jars
> spark-streaming-kinesis-asl_2.10-1.3.0.jar --verbose --class
> com.xxx.MyClass target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar
>
> Thanks again Richard!
>
> Cheers Mike.
>
>
> On Tue, Apr 14, 2015 at 11:01 AM, Richard Marscher <
> rmarsc...@localytics.com> wrote:
>
>> Hi,
>>
>> I've gotten an application working with sbt-assembly and spark, thought
>> I'd present an option. In my experience, trying to bundle any of the Spark
>> libraries in your uber jar is going to be a major pain. There will be a lot
>> of deduplication to work through and even if you resolve them it can be
>> easy to do it incorrectly. I considered it an intractable problem. So the
>> alternative is to not include those jars in your uber jar. For this to work
>> you will need the same libraries on the classpath of your Spark cluster and
>> your driver program (if you are running that as an application and not just
>> using spark-submit).
>>
>> As for your NoClassDefFoundError, you either are missing Joda Time in
>> your runtime classpath or have conflicting versions. It looks like
>> something related to AWS wants to use it. Check your uber jar to see if its
>> including the org/joda/time as well as the classpath of your spark cluster.
>> For example: I use the Spark 1.3.0 on Hadoop 1.x, which in the 'lib'
>> directory has an uber jar spark-assembly-1.3.0-hadoop1.0.4.jar. At one
>> point in Spark 1.2 I found a conflict between httpclient versions that my
>> uber jar pulled in for AWS libraries and the one bundled in the spark uber
>> jar. I hand patched the spark uber jar to remove the offending httpclient
>> bytecode to resolve the issue. You may be facing a similar situation.
>>
>> I hope that gives some ideas for resolving your issue.
>>
>> Regards,
>> Rich
>>
>> On Tue, Apr 14, 2015 at 1:14 PM, Mike Trienis <mike.trie...@orcsol.com>
>> wrote:
>>
>>> Hi Vadim,
>>>
>>> After removing "provided" from "org.apache.spark" %%
>>> "spark-streaming-kinesis-asl" I ended up with huge number of deduplicate
>>> errors:
>>>
>>> https://gist.github.com/trienism/3d6f8d6b7ff5b7cead6a
>>>
>>> It would be nice if you could share some pieces of your mergeStrategy
>>> code for reference.
>>>
>>> Also, after adding "provided" back to "spark-streaming-kinesis-asl" and
>>> I submit the spark job with the spark-streaming-kinesis-asl jar file
>>>
>>> sh /usr/lib/spark/bin/spark-submit --verbose --jars
>>> lib/spark-streaming-kinesis-asl_2.10-1.2.0.jar --class com.xxx.DataConsumer
>>> target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar
>>>
>>> I still end up with the following error...
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/joda/time/format/DateTimeFormat
>>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at java.lang.Class.newInstance(Class.java:379)
>>>
>>> Has anyone else run into this issue?
>>>
>>>
>>>
>>> On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy <
>>> vadim.bichuts...@gmail.com> wrote:
>>>
>>>> I don't believe the Kinesis asl should be provided. I used
>>>> mergeStrategy successfully to produce an "uber jar."
>>>>
>>>> Fyi, I've been having trouble consuming data out of Kinesis with Spark
>>>> with no success :(
>>>> Would be curious to know if you got it working.
>>>>
>>>> Vadim
>>>>
>>>> On Apr 13, 2015, at 9:36 PM, Mike Trienis <mike.trie...@orcsol.com>
>>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I have having trouble building a fat jar file through sbt-assembly.
>>>>
>>>> [warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
>>>> [warn] Merging 'META-INF/NOTICE' with strategy 'rename'
>>>> [warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
>>>> [warn] Merging 'META-INF/LICENSE' with strategy 'rename'
>>>> [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
>>>> [warn] Merging
>>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.properties' with
>>>> strategy 'discard'
>>>> [warn] Merging
>>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.xml' with strategy
>>>> 'discard'
>>>> [warn] Merging
>>>> 'META-INF/maven/commons-dbcp/commons-dbcp/pom.properties' with strategy
>>>> 'discard'
>>>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging
>>>> 'META-INF/maven/commons-pool/commons-pool/pom.properties' with strategy
>>>> 'discard'
>>>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.properties' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.xml' with strategy
>>>> 'discard'
>>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.properties'
>>>> with strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.properties' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.properties'
>>>> with strategy 'discard'
>>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.xml' with
>>>> strategy 'discard'
>>>> [warn] Merging 'META-INF/services/java.sql.Driver' with strategy
>>>> 'filterDistinctLines'
>>>> [warn] Merging 'rootdoc.txt' with strategy 'concat'
>>>> [warn] Strategy 'concat' was applied to a file
>>>> [warn] Strategy 'discard' was applied to 17 files
>>>> [warn] Strategy 'filterDistinctLines' was applied to a file
>>>> [warn] Strategy 'rename' was applied to 4 files
>>>>
>>>> When submitting the spark application through the command
>>>>
>>>> sh /usr/lib/spark/bin/spark-submit -class com.xxx.ExampleClassName
>>>> target/scala-2.10/xxxx-snapshot.jar
>>>>
>>>> I end up the the following error,
>>>>
>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> org/joda/time/format/DateTimeFormat
>>>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44)
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> at java.lang.Class.newInstance(Class.java:379)
>>>> at com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119)
>>>> at
>>>> com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105)
>>>> at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78)
>>>> at
>>>> com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307)
>>>> at
>>>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280)
>>>> at
>>>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
>>>> at
>>>> com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102)
>>>> at
>>>> com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216)
>>>> at
>>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:202)
>>>> at
>>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:175)
>>>> at
>>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:155)
>>>> at com.quickstatsengine.aws.AwsProvider$.<init>(AwsProvider.scala:20)
>>>> at com.quickstatsengine.aws.AwsProvider$.<clinit>(AwsProvider.scala)
>>>>
>>>> The snippet from my build.sbt file is:
>>>>
>>>>         "org.apache.spark" %% "spark-core" % "1.2.0" % "provided",
>>>>         "org.apache.spark" %% "spark-streaming" % "1.2.0" % "provided",
>>>>         "com.datastax.spark" %% "spark-cassandra-connector" %
>>>> "1.2.0-alpha1" % "provided",
>>>>         "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.2.0" %
>>>> "provided",
>>>>
>>>> And the error is originating from:
>>>>
>>>> val kinesisClient = new AmazonKinesisClient(new
>>>> DefaultAWSCredentialsProviderChain())
>>>>
>>>> Am I correct to set spark-streaming-kinesis-asl as a *provided *dependency?
>>>> Also, is there a merge strategy I need apply?
>>>>
>>>> Any help would be appreciated, Mike.
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to