[
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212848#comment-15212848
]
Jeff Zhang commented on SPARK-13587:
------------------------------------
bq. Have you considered using NFS or Amazon EFS to allow users to create and
manage their own envs and then mounting those on worker/executor nodes?
This problem is most of time you are not administrator and don't have
permission to do that. It's inefficient to ask your administrator to install
the environment for you.
bq. "one alternative to shared mounts is to store the thing in HDFS and use
something like --files / --archives in Spark.
Some packages are binary and need to compile. And it is not easy to do
dependency management in this way.
> Support virtualenv in PySpark
> -----------------------------
>
> Key: SPARK-13587
> URL: https://issues.apache.org/jira/browse/SPARK-13587
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Reporter: Jeff Zhang
>
> Currently, it's not easy for user to add third party python packages in
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native
> virtualenv another is through conda. This jira is trying to migrate these 2
> tools to distributed environment
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]