[ https://issues.apache.org/jira/browse/SPARK-25433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616992#comment-16616992 ]
Hyukjin Kwon commented on SPARK-25433: -------------------------------------- What's advantages of adding this and what are the alternatives? JIRA should describe what do fix not how to fix. > Add support for PEX in PySpark > ------------------------------ > > Key: SPARK-25433 > URL: https://issues.apache.org/jira/browse/SPARK-25433 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.2.2 > Reporter: Fabian Höring > Priority: Minor > > This has been partly discussed in SPARK-13587 > I would like to provision the executors with a PEX package. I created a PR > with minimal necessary changes in PythonWorkerFactory. > PR: [https://github.com/apache/spark/pull/22422/files] > To run it one needs to set PYSPARK_PYTHON & PYSPARK_DRIVER_PYTHON variables > to the pex file and upload the pex file to the executors via > sparkContext.addFile or by setting the spark config > spark.yarn.dist.files/spark.file properties > Also it is necessary to set the PEX_ROOT environment variable. By default > inside the executors it tries to access /home/.pex and this fails. > Ideally, as this configuration is quite cumbersome, it would be interesting > to also add a parameter --pexFile to SparkContext and spark-submit in order > to directly provide a pexFile and then everything else is handled. Please > tell me what you think of this. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org