Ben Mayne created SPARK-21618:
---------------------------------
Summary: http(s) not accepted in spark-submit jar uri
Key: SPARK-21618
URL: https://issues.apache.org/jira/browse/SPARK-21618
Project: Spark
Issue Type: Bug
Components: Deploy
Affects Versions: 2.2.0, 2.1.1
Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 16.04.
Reporter: Ben Mayne
Priority: Minor
The documentation suggests I should be able to use an http(s) uri for a jar in
spark-submit, but I haven't been successful
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
{noformat}
benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master
local[2] --class class.name.Test https://test.com/path/to/jar.jar
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at
org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
at
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at scala.Option.map(Option.scala:146)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
benmayne@Benjamins-MacBook-Pro ~ $
{noformat}
If I replace the path with a valid hdfs path
(hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the same
behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 on
ubuntu.
this is the example that I'm trying to replicate from
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:
> Spark uses the following URL scheme to allow different strategies for
> disseminating jars:
> file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file
> server, and every executor pulls the file from the driver HTTP server.
> hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as
> expected
{noformat}
# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
http://path/to/examples.jar \
1000
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]