[jira] [Comment Edited] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

Krzysztof Ruta (Jira) Thu, 27 Mar 2025 07:04:29 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938064#comment-17938064
 ]


Krzysztof Ruta edited comment on SPARK-46860 at 3/25/25 6:56 AM:
-----------------------------------------------------------------

The former PR [#50375|https://github.com/apache/spark/pull/50375] failed by a 
single (flaky?) test that didn't fail before (I've run the whole workflow 
several times before). The current one 
[#50377|https://github.com/apache/spark/pull/50377] passes all checks.

The suspicious test was:

</testcase><testcase 
classname="org.apache.spark.sql.streaming.FlatMapGroupsWithStateWithInitialStateSuite"
 name="flatMapGroupsWithState - initial state and initial batch have same keys 
and skipEmittingInitialStateKeys=false - state format version 1" time="0.84">


was (Author: JIRAUSER309126):
The former PR [#50375|https://github.com/apache/spark/pull/50375] failed by a 
single (flaky?) test that never failed before (I run the whole workflow several 
times before). The current one 
[#50377|https://github.com/apache/spark/pull/50377] passes all checks.

The suspicious test was:

</testcase><testcase 
classname="org.apache.spark.sql.streaming.FlatMapGroupsWithStateWithInitialStateSuite"
 name="flatMapGroupsWithState - initial state and initial batch have same keys 
and skipEmittingInitialStateKeys=false - state format version 1" time="0.84">

> Credentials with https url not working for --jars, --files, --archives & 
> --py-files options on spark-submit command
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-46860
>                 URL: https://issues.apache.org/jira/browse/SPARK-46860
>             Project: Spark
>          Issue Type: Task
>          Components: k8s
>    Affects Versions: 3.3.3, 3.5.0, 3.3.4
>         Environment: Spark 3.3.3 deployed on K8s 
>            Reporter: Vikram Janarthanan
>            Priority: Major
>              Labels: pull-request-available
>
> We are trying to run the spark application by pointing the dependent files as 
> well the main pyspark script from secure webserver
> We are looking for solution to pass the dependencies as well as pysaprk 
> script from webserver.
> we have tried deploying the spark application from webserver to k8s cluster 
> without username and password and it worked, but when tried with 
> username/password we are facing "Exception in thread "{*}main" 
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> https://username:[email protected]/application/pysparkjob.py{*}";
> *Working  options on spark-submit:*
> spark-submit ......
> --repositories https://username:[email protected]/repo1/repo
> --jars https://domain.com/jars/runtime.jar \
> --files https://domain.com/files/query.sql \
> --py-files [https://domain.com/pythonlib/pythonlib.zip] \
> https://domain.com/app1/pysparkapp.py
> Note: only repositories option works with username and password
> *Spark-submit using https url with username/password not working:*
> spark-submit ......
> --jars https://username:[email protected]/jars/runtime.jar \
> --files https://username:[email protected]/files/query.sql \
> --py-files 
> https://username:[email protected][/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip]
>  \
> https://username:[email protected]/app1/pysparkapp.py
>  
> Error :
> 25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.io.IOException: Server returned HTTP response 
> code: 401 for URL: 
> https://username:[email protected]/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>         at 
> java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
>         at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
>         at 
> org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
>         at 
> org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>         at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

Reply via email to