Vikram Janarthanan created SPARK-46860:
------------------------------------------

             Summary: Credentials with https url not working for --jars, 
--files, --archives & --py-files options on spark-submit command
                 Key: SPARK-46860
                 URL: https://issues.apache.org/jira/browse/SPARK-46860
             Project: Spark
          Issue Type: Task
          Components: k8s
    Affects Versions: 3.3.4
         Environment: Spark 3.3.3 deployed on K8s 
            Reporter: Vikram Janarthanan


We are trying to run the spark application by pointing the dependent files as 
well the main pyspark script from secure webserver

We are looking for solution to pass the dependencies as well as pysaprk script 
from webserver.

we have tried deploying the spark application from webserver to k8s cluster 
without username and password and it worked, but when tried with 
username/password we are facing "Exception in thread "{*}main" 
java.io.IOException: Server returned HTTP response code: 401 for URL: 
https://username:passw...@domain.com/application/pysparkjob.py{*}";

*Working  options on spark-submit:*
spark-submit ......

--repositories https://username:passw...@domain.com/repo1/repo

--jars https://domain.com/jars/runtime.jar \

--files https://domain.com/files/query.sql \

--py-files [https://domain.com/pythonlib/pythonlib.zip] \

https://domain.com/app1/pysparkapp.py

Note: only repositories option works with username and password

*Spark-submit using https url with username/password not working:*

spark-submit ......

--jars https://username:passw...@domain.com/jars/runtime.jar \

--files https://username:passw...@domain.com/files/query.sql \

--py-files 
https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip]
 \

https://username:passw...@domain.com/app1/pysparkapp.py

 

Error :

25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Server returned HTTP response 
code: 401 for URL: 
https://username:passw...@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
        at 
java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
        at 
java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
        at 
java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
        at 
org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
        at 
org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to