[ 
https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488486#comment-16488486
 ] 

Hyukjin Kwon commented on SPARK-21945:
--------------------------------------

Right. This case looks weird again. In case the given Python file is .py file 
(zip file seems fine), seems the python path is dynamically added after the 
context is got initialized.

with this pyFile:

{code}
$ cat /home/spark/tmp.py
def testtest():
    return 1
{code}

This works:

{code}
$ cat app.py
import pyspark
pyspark.sql.SparkSession.builder.getOrCreate()
import tmp
print("************************%s" % tmp.testtest())

$ ./bin/spark-submit --master yarn --deploy-mode client --py-files 
/home/spark/tmp.py app.py
...
************************1
{code}

but this doesn't:

{code}
$ cat app.py
import pyspark
import tmp
pyspark.sql.SparkSession.builder.getOrCreate()
print("************************%s" % tmp.testtest())

$ ./bin/spark-submit --master yarn --deploy-mode client --py-files 
/home/spark/tmp.py app.py
Traceback (most recent call last):
  File "/home/spark/spark/app.py", line 2, in <module>
    import tmp
ImportError: No module named tmp
{code}



> pyspark --py-files doesn't work in yarn client mode
> ---------------------------------------------------
>
>                 Key: SPARK-21945
>                 URL: https://issues.apache.org/jira/browse/SPARK-21945
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>            Reporter: Thomas Graves
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 2.3.1, 2.4.0
>
>
> I tried running pyspark with --py-files pythonfiles.zip  but it doesn't 
> properly add the zip file to the PYTHONPATH.
> I can work around by exporting PYTHONPATH.
> Looking in SparkSubmitCommandBuilder.buildPySparkShellCommand  I don't see 
> this supported at all.   If that is the case perhaps it should be moved to 
> improvement.
> Note it works via spark-submit in both client and cluster mode to run python 
> script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to