[ https://issues.apache.org/jira/browse/SPARK-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930922#comment-15930922 ]
Jeff Zhang edited comment on SPARK-20001 at 3/18/17 12:14 AM: -------------------------------------------------------------- Thanks [~dansanduleac] It looks like we are doing similar things, recently I made some improvements in SPARK-13587, you can check my doc if you are interested. https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit was (Author: zjffdu): Thanks [~dansanduleac] It looks like we are do similar things, recently I made some improvements in SPARK-13587, you can check my doc if you are interested. https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit > Support PythonRunner executing inside a Conda env > ------------------------------------------------- > > Key: SPARK-20001 > URL: https://issues.apache.org/jira/browse/SPARK-20001 > Project: Spark > Issue Type: New Feature > Components: PySpark, Spark Core > Affects Versions: 2.2.0 > Reporter: Dan Sanduleac > Original Estimate: 168h > Remaining Estimate: 168h > > Similar to SPARK-13587, I'm trying to allow the user to configure a Conda > environment that PythonRunner will run from. > This change remembers theconda environment found on the driver and installs > the same packages on the executor side, only once per PythonWorkerFactory. > The list of requested conda packages are added to the PythonWorkerFactory > cache, so two collects using the same environment (incl packages) can re-use > the same running executors. > You have to specify outright what packages and channels to "bootstrap" the > environment with. > However, SparkContext (as well as JavaSparkContext & the pyspark version) are > expanded to support addCondaPackage and addCondaChannel. > Rationale is: > * you might want to add more packages once you're already running in the > driver > * you might want to add a channel which requires some token for > authentication, which you don't yet have access to until the module is > already running > This issue requires that the conda binary is already available on the driver > as well as executors, you just have to specify where it can be found. > Please see the attached pull request on palantir/spark for additional > details: https://github.com/palantir/spark/pull/115 > As for tests, there is a local python test, as well as yarn client & > cluster-mode tests, which ensure that a newly installed package is visible > from both the driver and the executor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org