[
https://issues.apache.org/jira/browse/SPARK-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930922#comment-15930922
]
Jeff Zhang commented on SPARK-20001:
------------------------------------
Thanks [~dansanduleac] It looks like we are do similar things, recently I made
some improvements in SPARK-13587, you can check my doc if you are interested.
https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit
> Support PythonRunner executing inside a Conda env
> -------------------------------------------------
>
> Key: SPARK-20001
> URL: https://issues.apache.org/jira/browse/SPARK-20001
> Project: Spark
> Issue Type: New Feature
> Components: PySpark, Spark Core
> Affects Versions: 2.2.0
> Reporter: Dan Sanduleac
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Similar to SPARK-13587, I'm trying to allow the user to configure a Conda
> environment that PythonRunner will run from.
> This change remembers theconda environment found on the driver and installs
> the same packages on the executor side, only once per PythonWorkerFactory.
> The list of requested conda packages are added to the PythonWorkerFactory
> cache, so two collects using the same environment (incl packages) can re-use
> the same running executors.
> You have to specify outright what packages and channels to "bootstrap" the
> environment with.
> However, SparkContext (as well as JavaSparkContext & the pyspark version) are
> expanded to support addCondaPackage and addCondaChannel.
> Rationale is:
> * you might want to add more packages once you're already running in the
> driver
> * you might want to add a channel which requires some token for
> authentication, which you don't yet have access to until the module is
> already running
> This issue requires that the conda binary is already available on the driver
> as well as executors, you just have to specify where it can be found.
> Please see the attached pull request on palantir/spark for additional
> details: https://github.com/palantir/spark/pull/115
> As for tests, there is a local python test, as well as yarn client &
> cluster-mode tests, which ensure that a newly installed package is visible
> from both the driver and the executor.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]