[
https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073832#comment-15073832
]
Roi Reshef edited comment on SPARK-10789 at 12/29/15 11:56 AM:
---------------------------------------------------------------
Thanks [~jonathak]. That requires rebuilding spark and redistributing it across
my cluster, right? I finally figured out a solution to import external jars
without rebuilding spark. One can modify two configurations inside spark-env.sh
(at least for Netlib package, which include *.jar and *.so):
spark.{driver,executor}.extraClassPath - for *.jar
spark.{driver,executor}.extraLibraryPath - for *.so
And spark (I'm using v1.5.2) will pick them up automatically
was (Author: roireshef):
Thanks [~jonathak]. That requires rebuilding spark and redistributing it across
my cluster, right? I finally figured out a solution to import external jars
without rebuilding spark. One can modify two configurations inside spark-env.sh
(at least for Netlib package, which include *.jar and *.so):
spark. { driver,executor } .extraClassPath - for *.jar
spark. { driver,executor } .extraLibraryPath - for *.so
And spark (I'm using v1.5.2) will pick them up automatically
> Cluster mode SparkSubmit classpath only includes Spark assembly
> ---------------------------------------------------------------
>
> Key: SPARK-10789
> URL: https://issues.apache.org/jira/browse/SPARK-10789
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Jonathan Kelly
> Attachments: SPARK-10789.diff, SPARK-10789.v1.6.0.diff
>
>
> When using cluster deploy mode, the classpath of the SparkSubmit process that
> gets launched only includes the Spark assembly and not
> spark.driver.extraClassPath. This is of course by design, since the driver
> actually runs on the cluster and not inside the SparkSubmit process.
> However, if the SparkSubmit process, minimal as it may be, needs any extra
> libraries that are not part of the Spark assembly, there is no good way to
> include them. (I say "no good way" because including them in the
> SPARK_CLASSPATH environment variable does cause the SparkSubmit process to
> include them, but this is not acceptable because this environment variable
> has long been deprecated, and it prevents the use of
> spark.driver.extraClassPath.)
> An example of when this matters is on Amazon EMR when using an S3 path for
> the application JAR and running in yarn-cluster mode. The SparkSubmit process
> needs the EmrFileSystem implementation and its dependencies in the classpath
> in order to download the application JAR from S3, so it fails with a
> ClassNotFoundException. (EMR currently gets around this by setting
> SPARK_CLASSPATH, but as mentioned above this is less than ideal.)
> I have tried modifying SparkSubmitCommandBuilder to include the driver extra
> classpath whether it's client mode or cluster mode, and this seems to work,
> but I don't know if there is any downside to this.
> Example that fails on emr-4.0.0 (if you switch to setting
> spark.(driver,executor).extraClassPath instead of SPARK_CLASSPATH):
> spark-submit --deploy-mode cluster --class
> org.apache.spark.examples.JavaWordCount s3://my-bucket/spark-examples.jar
> s3://my-bucket/word-count-input.txt
> Resulting Exception:
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
> at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2626)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2639)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
> at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
> at
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:233)
> at
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:327)
> at
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:366)
> at
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:364)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:364)
> at
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
> at
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: Class
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
> ... 27 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]