[jira] [Comment Edited] (SPARK-20001) Support PythonRunner executing inside a Conda env

Jeff Zhang (JIRA) Fri, 17 Mar 2017 17:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930922#comment-15930922
 ]


Jeff Zhang edited comment on SPARK-20001 at 3/18/17 12:14 AM:
--------------------------------------------------------------

Thanks [~dansanduleac] It looks like we are doing similar things, recently I 
made some improvements in SPARK-13587, you can check my doc if you are 
interested. 
https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit


was (Author: zjffdu):
Thanks [~dansanduleac] It looks like we are do similar things, recently I made 
some improvements in SPARK-13587, you can check my doc if you are interested. 
https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit

> Support PythonRunner executing inside a Conda env
> -------------------------------------------------
>
>                 Key: SPARK-20001
>                 URL: https://issues.apache.org/jira/browse/SPARK-20001
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Dan Sanduleac
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Similar to SPARK-13587, I'm trying to allow the user to configure a Conda 
> environment that PythonRunner will run from. 
> This change remembers theconda environment found on the driver and installs 
> the same packages on the executor side, only once per PythonWorkerFactory. 
> The list of requested conda packages are added to the PythonWorkerFactory 
> cache, so two collects using the same environment (incl packages) can re-use 
> the same running executors.
> You have to specify outright what packages and channels to "bootstrap" the 
> environment with. 
> However, SparkContext (as well as JavaSparkContext & the pyspark version) are 
> expanded to support addCondaPackage and addCondaChannel.
> Rationale is:
> * you might want to add more packages once you're already running in the 
> driver
> * you might want to add a channel which requires some token for 
> authentication, which you don't yet have access to until the module is 
> already running
> This issue requires that the conda binary is already available on the driver 
> as well as executors, you just have to specify where it can be found.
> Please see the attached pull request on palantir/spark for additional 
> details: https://github.com/palantir/spark/pull/115
> As for tests, there is a local python test, as well as yarn client & 
> cluster-mode tests, which ensure that a newly installed package is visible 
> from both the driver and the executor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20001) Support PythonRunner executing inside a Conda env

Reply via email to