[
https://issues.apache.org/jira/browse/SPARK-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-20001:
---------------------------------
Labels: bulk-closed (was: )
> Support PythonRunner executing inside a Conda env
> -------------------------------------------------
>
> Key: SPARK-20001
> URL: https://issues.apache.org/jira/browse/SPARK-20001
> Project: Spark
> Issue Type: New Feature
> Components: PySpark, Spark Core
> Affects Versions: 2.2.0
> Reporter: Dan Sanduleac
> Priority: Major
> Labels: bulk-closed
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Similar to SPARK-13587, I'm trying to allow the user to configure a Conda
> environment that PythonRunner will run from.
> This change remembers theconda environment found on the driver and installs
> the same packages on the executor side, only once per PythonWorkerFactory.
> The list of requested conda packages are added to the PythonWorkerFactory
> cache, so two collects using the same environment (incl packages) can re-use
> the same running executors.
> You have to specify outright what packages and channels to "bootstrap" the
> environment with.
> However, SparkContext (as well as JavaSparkContext & the pyspark version) are
> expanded to support addCondaPackage and addCondaChannel.
> Rationale is:
> * you might want to add more packages once you're already running in the
> driver
> * you might want to add a channel which requires some token for
> authentication, which you don't yet have access to until the module is
> already running
> This issue requires that the conda binary is already available on the driver
> as well as executors, you just have to specify where it can be found.
> Please see the attached pull request on palantir/spark for additional
> details: https://github.com/palantir/spark/pull/115
> As for tests, there is a local python test, as well as yarn client &
> cluster-mode tests, which ensure that a newly installed package is visible
> from both the driver and the executor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]