I have submitted a defect in JIRA for this: https://issues.apache.org/jira/browse/SPARK-3638 and have submitted a PR ( https://github.com/apache/spark/pull/2489) that temporarily fixes the issue. Users would have to build spark with kinesis-asl to get the compatible httpclient added to spark assembly jar.
On 22 September 2014 15:00, 이인규(inQ) <gof...@gmail.com> wrote: > Hello, > > In my case, I manually deleted org/apache/http directory in the > spark-assembly jar file.. > I think if we use the latest version of httpclient (httpcore) library, we > can resolve the problem. > How about upgrading httpclient? (or jets3t?) > > 2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar <aniket.bhatna...@gmail.com>: > >> Thanks everyone for weighing in on this. >> >> I had backported kinesis module from master to spark 1.0.2 so just to >> confirm if I am not missing anything, I did a dependency graph compare of >> my spark build with spark-master >> and org.apache.httpcomponents:httpclient:jar does seem to resolve to 4.1.2 >> dependency. >> >> I need Hive so, I can't really do a build without it. Even if I >> exclude httpclient >> dependency from my project's build, it will not solve the problem because >> AWS SDK has been compiled with a greater version of http client. My spark >> stream project does not uses http client directly. AWS SDK will look for >> class org.apache.http.impl.conn.DefaultClientConnectionOperator and it >> will be loaded from spark-assembly jar regardless of how I package my >> project (unless I am missing something?). I enabled verbosed classloading >> to confirm that the class is indeed loading from spark-assembly jar. >> >> spark.files.userClassPathFirst option doesn't seem to be working on my >> spark 1.0.2 build (not sure why). >> >> I was only left custom building spark and forcingly introduce latest >> httpclient's latest version as dependency. >> >> Finally, I tested this on 1.1.0-RC4 today and it has the same issue. Has >> anyone ever been able to get the Kinesis example work with spark-hadoop2.4 >> (with hive and yarn) build? I feel like this is a bug that exists even in >> 1.1.0. >> >> I still believe we need a better solution to address the dependency hell >> problem. If OSGi is deemed too over the top, what are the solutions being >> investigated? >> >> On 6 September 2014 04:44, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > From output of dependency:tree: >> > >> > [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ >> > spark-streaming_2.10 --- >> > [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT >> > INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile >> > [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile >> > ... >> > [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile >> > [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile >> > [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile >> > [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile >> > >> > bq. excluding httpclient from spark-streaming dependency in your >> > sbt/maven project >> > >> > This should work. >> > >> > >> > On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das < >> tathagata.das1...@gmail.com >> > > wrote: >> > >> >> If httpClient dependency is coming from Hive, you could build Spark >> >> without >> >> Hive. Alternatively, have you tried excluding httpclient from >> >> spark-streaming dependency in your sbt/maven project? >> >> >> >> TD >> >> >> >> >> >> >> >> On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers <ko...@tresata.com> >> wrote: >> >> >> >> > custom spark builds should not be the answer. at least not if spark >> ever >> >> > wants to have a vibrant community for spark apps. >> >> > >> >> > spark does support a user-classpath-first option, which would deal >> with >> >> > some of these issues, but I don't think it works. >> >> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego" <fborr...@gilt.com> >> >> wrote: >> >> > >> >> > > Hi, >> >> > > I run into the same issue and apart from the ideas Aniket said, I >> only >> >> > > could find a nasty workaround. Add my custom >> >> > PoolingClientConnectionManager >> >> > > to my classpath. >> >> > > >> >> > > >> >> > > >> >> > >> >> >> http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 >> >> > > >> >> > > >> >> > > >> >> > > On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen <so...@cloudera.com> >> >> wrote: >> >> > > >> >> > > > Dumb question -- are you using a Spark build that includes the >> >> Kinesis >> >> > > > dependency? that build would have resolved conflicts like this >> for >> >> > > > you. Your app would need to use the same version of the Kinesis >> >> client >> >> > > > SDK, ideally. >> >> > > > >> >> > > > All of these ideas are well-known, yes. In cases of super-common >> >> > > > dependencies like Guava, they are already shaded. This is a >> >> > > > less-common source of conflicts so I don't think http-client is >> >> > > > shaded, especially since it is not used directly by Spark. I >> think >> >> > > > this is a case of your app conflicting with a third-party >> >> dependency? >> >> > > > >> >> > > > I think OSGi is deemed too over the top for things like this. >> >> > > > >> >> > > > On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar >> >> > > > <aniket.bhatna...@gmail.com> wrote: >> >> > > > > I am trying to use Kinesis as source to Spark Streaming and >> have >> >> run >> >> > > > into a >> >> > > > > dependency issue that can't be resolved without making my own >> >> custom >> >> > > > Spark >> >> > > > > build. The issue is that Spark is transitively dependent >> >> > > > > on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think >> >> because of >> >> > > > > libfb303 coming from hbase and hive-serde) whereas AWS SDK is >> >> > dependent >> >> > > > > on org.apache.httpcomponents:httpclient:jar:4.2. When I package >> >> and >> >> > run >> >> > > > > Spark Streaming application, I get the following: >> >> > > > > >> >> > > > > Caused by: java.lang.NoSuchMethodError: >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) >> >> > > > > at >> >> > > > > >> >> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:181) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:136) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:117) >> >> > > > > at >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.<init>(AmazonKinesisAsyncClient.java:132) >> >> > > > > >> >> > > > > I can create a custom Spark build with >> >> > > > > org.apache.httpcomponents:httpclient:jar:4.2 included in the >> >> assembly >> >> > > > but I >> >> > > > > was wondering if this is something Spark devs have noticed and >> are >> >> > > > looking >> >> > > > > to resolve in near releases. Here are my thoughts on this >> issue: >> >> > > > > >> >> > > > > Containers that allow running custom user code have to often >> >> resolve >> >> > > > > dependency issues in case of conflicts between framework's and >> >> user >> >> > > > code's >> >> > > > > dependency. Here is how I have seen some frameworks resolve the >> >> > issue: >> >> > > > > 1. Provide a child-first class loader: Some JEE containers >> >> provided a >> >> > > > > child-first class loader that allowed for loading classes from >> >> user >> >> > > code >> >> > > > > first. I don't think this approach completely solves the >> problem >> >> as >> >> > the >> >> > > > > framework is then susceptible to class mismatch errors. >> >> > > > > 2. Fold in all dependencies in a sub-package: This approach >> >> involves >> >> > > > > folding all dependencies in a project specific sub-package >> (like >> >> > > > > spark.dependencies). This approach is tedious because it >> involves >> >> > > > building >> >> > > > > custom version of all dependencies (and their transitive >> >> > dependencies) >> >> > > > > 3. Use something like OSGi: Some frameworks has successfully >> used >> >> > OSGi >> >> > > to >> >> > > > > manage dependencies between the modules. The challenge in this >> >> > approach >> >> > > > is >> >> > > > > to OSGify the framework and hide OSGi complexities from end >> user. >> >> > > > > >> >> > > > > My personal preference is OSGi (or atleast some support for >> OSGi) >> >> > but I >> >> > > > > would love to hear what Spark devs are thinking in terms of >> >> resolving >> >> > > the >> >> > > > > problem. >> >> > > > > >> >> > > > > Thanks, >> >> > > > > Aniket >> >> > > > >> >> > > > >> >> --------------------------------------------------------------------- >> >> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> > > > For additional commands, e-mail: dev-h...@spark.apache.org >> >> > > > >> >> > > > >> >> > > >> >> > >> >> >> > >> > >> > >