Hello, In my case, I manually deleted org/apache/http directory in the spark-assembly jar file.. I think if we use the latest version of httpclient (httpcore) library, we can resolve the problem. How about upgrading httpclient? (or jets3t?)
2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar <aniket.bhatna...@gmail.com>: > Thanks everyone for weighing in on this. > > I had backported kinesis module from master to spark 1.0.2 so just to > confirm if I am not missing anything, I did a dependency graph compare of > my spark build with spark-master > and org.apache.httpcomponents:httpclient:jar does seem to resolve to 4.1.2 > dependency. > > I need Hive so, I can't really do a build without it. Even if I > exclude httpclient > dependency from my project's build, it will not solve the problem because > AWS SDK has been compiled with a greater version of http client. My spark > stream project does not uses http client directly. AWS SDK will look for > class org.apache.http.impl.conn.DefaultClientConnectionOperator and it > will be loaded from spark-assembly jar regardless of how I package my > project (unless I am missing something?). I enabled verbosed classloading > to confirm that the class is indeed loading from spark-assembly jar. > > spark.files.userClassPathFirst option doesn't seem to be working on my > spark 1.0.2 build (not sure why). > > I was only left custom building spark and forcingly introduce latest > httpclient's latest version as dependency. > > Finally, I tested this on 1.1.0-RC4 today and it has the same issue. Has > anyone ever been able to get the Kinesis example work with spark-hadoop2.4 > (with hive and yarn) build? I feel like this is a bug that exists even in > 1.1.0. > > I still believe we need a better solution to address the dependency hell > problem. If OSGi is deemed too over the top, what are the solutions being > investigated? > > On 6 September 2014 04:44, Ted Yu <yuzhih...@gmail.com> wrote: > > > From output of dependency:tree: > > > > [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ > > spark-streaming_2.10 --- > > [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT > > INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile > > [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile > > ... > > [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile > > [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile > > [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile > > [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile > > > > bq. excluding httpclient from spark-streaming dependency in your > > sbt/maven project > > > > This should work. > > > > > > On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das < > tathagata.das1...@gmail.com > > > wrote: > > > >> If httpClient dependency is coming from Hive, you could build Spark > >> without > >> Hive. Alternatively, have you tried excluding httpclient from > >> spark-streaming dependency in your sbt/maven project? > >> > >> TD > >> > >> > >> > >> On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers <ko...@tresata.com> > wrote: > >> > >> > custom spark builds should not be the answer. at least not if spark > ever > >> > wants to have a vibrant community for spark apps. > >> > > >> > spark does support a user-classpath-first option, which would deal > with > >> > some of these issues, but I don't think it works. > >> > On Sep 4, 2014 9:01 AM, "Felix Garcia Borrego" <fborr...@gilt.com> > >> wrote: > >> > > >> > > Hi, > >> > > I run into the same issue and apart from the ideas Aniket said, I > only > >> > > could find a nasty workaround. Add my custom > >> > PoolingClientConnectionManager > >> > > to my classpath. > >> > > > >> > > > >> > > > >> > > >> > http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 > >> > > > >> > > > >> > > > >> > > On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen <so...@cloudera.com> > >> wrote: > >> > > > >> > > > Dumb question -- are you using a Spark build that includes the > >> Kinesis > >> > > > dependency? that build would have resolved conflicts like this for > >> > > > you. Your app would need to use the same version of the Kinesis > >> client > >> > > > SDK, ideally. > >> > > > > >> > > > All of these ideas are well-known, yes. In cases of super-common > >> > > > dependencies like Guava, they are already shaded. This is a > >> > > > less-common source of conflicts so I don't think http-client is > >> > > > shaded, especially since it is not used directly by Spark. I think > >> > > > this is a case of your app conflicting with a third-party > >> dependency? > >> > > > > >> > > > I think OSGi is deemed too over the top for things like this. > >> > > > > >> > > > On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar > >> > > > <aniket.bhatna...@gmail.com> wrote: > >> > > > > I am trying to use Kinesis as source to Spark Streaming and have > >> run > >> > > > into a > >> > > > > dependency issue that can't be resolved without making my own > >> custom > >> > > > Spark > >> > > > > build. The issue is that Spark is transitively dependent > >> > > > > on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think > >> because of > >> > > > > libfb303 coming from hbase and hive-serde) whereas AWS SDK is > >> > dependent > >> > > > > on org.apache.httpcomponents:httpclient:jar:4.2. When I package > >> and > >> > run > >> > > > > Spark Streaming application, I get the following: > >> > > > > > >> > > > > Caused by: java.lang.NoSuchMethodError: > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) > >> > > > > at > >> > > > > > >> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:181) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:136) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:117) > >> > > > > at > >> > > > > > >> > > > > >> > > > >> > > >> > com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.<init>(AmazonKinesisAsyncClient.java:132) > >> > > > > > >> > > > > I can create a custom Spark build with > >> > > > > org.apache.httpcomponents:httpclient:jar:4.2 included in the > >> assembly > >> > > > but I > >> > > > > was wondering if this is something Spark devs have noticed and > are > >> > > > looking > >> > > > > to resolve in near releases. Here are my thoughts on this issue: > >> > > > > > >> > > > > Containers that allow running custom user code have to often > >> resolve > >> > > > > dependency issues in case of conflicts between framework's and > >> user > >> > > > code's > >> > > > > dependency. Here is how I have seen some frameworks resolve the > >> > issue: > >> > > > > 1. Provide a child-first class loader: Some JEE containers > >> provided a > >> > > > > child-first class loader that allowed for loading classes from > >> user > >> > > code > >> > > > > first. I don't think this approach completely solves the problem > >> as > >> > the > >> > > > > framework is then susceptible to class mismatch errors. > >> > > > > 2. Fold in all dependencies in a sub-package: This approach > >> involves > >> > > > > folding all dependencies in a project specific sub-package (like > >> > > > > spark.dependencies). This approach is tedious because it > involves > >> > > > building > >> > > > > custom version of all dependencies (and their transitive > >> > dependencies) > >> > > > > 3. Use something like OSGi: Some frameworks has successfully > used > >> > OSGi > >> > > to > >> > > > > manage dependencies between the modules. The challenge in this > >> > approach > >> > > > is > >> > > > > to OSGify the framework and hide OSGi complexities from end > user. > >> > > > > > >> > > > > My personal preference is OSGi (or atleast some support for > OSGi) > >> > but I > >> > > > > would love to hear what Spark devs are thinking in terms of > >> resolving > >> > > the > >> > > > > problem. > >> > > > > > >> > > > > Thanks, > >> > > > > Aniket > >> > > > > >> > > > > >> --------------------------------------------------------------------- > >> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> > > > For additional commands, e-mail: dev-h...@spark.apache.org > >> > > > > >> > > > > >> > > > >> > > >> > > > > >