Hi, Spark experts,
I have the following issue when using aws java sdk in my spark application. Here I narrowed down the following steps to reproduce the problem 1) I have Spark 1.1.0 with hadoop 2.4 installed on 3 nodes cluster 2) from the master node, I did the following steps. spark-shell --jars ws-java-sdk-1.7.2.jar import com.amazonaws.{Protocol, ClientConfiguration} import com.amazonaws.auth.BasicAWSCredentials import com.amazonaws.services.s3.AmazonS3Client val clientConfiguration = new ClientConfiguration() val s3accessKey="X" val s3secretKey="Y" val credentials = new BasicAWSCredentials(s3accessKey,s3secretKey) println("CLASSPATH="+System.getenv("CLASSPATH")) CLASSPATH=::/home/hadoop/spark/conf:/home/hadoop/spark/lib/spark-assembly-1.1.0-hadoop2.4.0.jar:/home/hadoop/conf:/home/hadoop/conf println("java.class.path="+System.getProperty("java.class.path")) java.class.path=::/home/hadoop/spark/conf:/home/hadoop/spark/lib/spark-assembly-1.1.0-hadoop2.4.0.jar:/home/hadoop/conf:/home/hadoop/conf So far all look good and normal. But then the following step will fail and it looks like the class loader can't resolve to the right class. Any suggestion for Spark application that requires aws sdk? scala> val s3Client = new AmazonS3Client(credentials, clientConfiguration) java.lang.NoClassDefFoundError: org/apache/http/impl/conn/PoolingClientConnectionManager at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96) at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155) at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:334) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:21) at $iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC.<init>(<console>:28) at $iwC.<init>(<console>:30) at <init>(<console>:32) at .<init>(<console>:36) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.http.impl.conn.PoolingClientConnectionManager at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 46 more Thanks. Tian