Ben Mayne created SPARK-21618:
---------------------------------

             Summary: http(s) not accepted in spark-submit jar uri
                 Key: SPARK-21618
                 URL: https://issues.apache.org/jira/browse/SPARK-21618
             Project: Spark
          Issue Type: Bug
          Components: Deploy
    Affects Versions: 2.2.0, 2.1.1
         Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 16.04. 
            Reporter: Ben Mayne
            Priority: Minor


The documentation suggests I should be able to use an http(s) uri for a jar in 
spark-submit, but I haven't been successful 
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

{noformat}
benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master 
local[2] --class class.name.Test https://test.com/path/to/jar.jar
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
        at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at 
org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
        at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
        at 
org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
        at scala.Option.map(Option.scala:146)
        at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
benmayne@Benjamins-MacBook-Pro ~ $
{noformat}

If I replace the path with a valid hdfs path 
(hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the same 
behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 on 
ubuntu. 

this is the example that I'm trying to replicate from 
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:
 

> Spark uses the following URL scheme to allow different strategies for 
> disseminating jars:
> file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file 
> server, and every executor pulls the file from the driver HTTP server.
> hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as 
> expected


{noformat}
# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to