[ 
https://issues.apache.org/jira/browse/SPARK-28095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emma Dickson updated SPARK-28095:
---------------------------------
    Description: 
When passing in arguments to a bash script that sets up spark submit using a 
python file that sets up a pyspark context strings with spaces are processed as 
individual strings. This occurs even when the argument is encased in double 
quotes, using backslashes or unicode escape characters.

 

Example

Command entered
{code:java}
./scripts/spark-k8s.sh v0.0.32 --job-args 
"cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" 
--job pages{code}
 

Error Message

 
{code:java}
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
/opt/spark/work-dir/main.py --job-args 
cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer --job 
pages
19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
usage: main.py [-h] --job JOB --job-args JOB_ARGS
main.py: error: unrecognized arguments: Balancer
{code}

  was:
When passing in arguments to a bash script that calls a python file which sets 
up a pyspark context strings with spaces are processed as individual strings 
even when encased in double quotes, using backslashes or unicode escape 
characters.

 

Example

Command entered
{code:java}
./scripts/spark-k8s.sh v0.0.32 --job-args 
"cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" 
--job pages{code}
 

Error Message

 
{code:java}
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
/opt/spark/work-dir/main.py --job-args 
cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer --job 
pages
19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
usage: main.py [-h] --job JOB --job-args JOB_ARGS
main.py: error: unrecognized arguments: Balancer
{code}


> Pyspark with kubernetes doesn't parse arguments with spaces as expected.
> ------------------------------------------------------------------------
>
>                 Key: SPARK-28095
>                 URL: https://issues.apache.org/jira/browse/SPARK-28095
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, PySpark
>    Affects Versions: 2.4.3
>         Environment: Python 2.7.13
> Spark 2.4.3
> Kubernetes
>  
>            Reporter: Emma Dickson
>            Priority: Minor
>              Labels: newbie, usability
>
> When passing in arguments to a bash script that sets up spark submit using a 
> python file that sets up a pyspark context strings with spaces are processed 
> as individual strings. This occurs even when the argument is encased in 
> double quotes, using backslashes or unicode escape characters.
>  
> Example
> Command entered
> {code:java}
> ./scripts/spark-k8s.sh v0.0.32 --job-args 
> "cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" 
> --job pages{code}
>  
> Error Message
>  
> {code:java}
> + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> /opt/spark/work-dir/main.py --job-args 
> cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer 
> --job pages
> 19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of 
> HTTPS.
> 19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> usage: main.py [-h] --job JOB --job-args JOB_ARGS
> main.py: error: unrecognized arguments: Balancer
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to