Since the jars are already on HDFS, you can access them directly in your
Spark application without using --jars
Cheng
On 6/11/15 11:04 AM, Dong Lei wrote:
Hi spark-dev:
I can not use a hdfs location for the “--jars” or “--files” option
when doing a spark-submit in a standalone cluster mode. For example:
Spark-submit … --jars hdfs://ip/1.jar ….
hdfs://ip/app.jar (standalone cluster mode)
will not download 1.jar to driver’s http file server(but the app.jar
will be downloaded to the driver’s dir).
I figure out the reason spark not downloading the jars is that when
doing sc.addJar to http file server, the function called is Files.copy
which does not support a remote location.
And I think if spark can download the jars and add them to http file
server, the classpath is not correctly set, because the classpath
contains remote location.
So I’m trying to make it work and come up with two options, but
neither of them seem to be elegant, and I want to hear your advices:
Option 1:
Modify HTTPFileServer.addFileToDir, let it recognize a “hdfs” prefix.
This is not good because I think it breaks the scope of http file server.
Option 2:
Modify DriverRunner.downloadUserJar, let it download all the “--jars”
and “--files” with the application jar.
This sounds more reasonable that option 1 for downloading files. But
this way I need to read the “spark.jars” and “spark.files” on
downloadUserJar or DriverRunnder.start and replace it with a local
path. How can I do that?
Do you have a more elegant solution, or do we have a plan to support
it in the furture?
Thanks
Dong Lei