[
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512433#comment-16512433
]
Matt Mould edited comment on SPARK-13587 at 6/14/18 1:21 PM:
-------------------------------------------------------------
What is the current status of this ticket please? This
[article|https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html]
suggests that it's done, but it doesn't work for me with the following command.
{code:java}
spark-submit --deploy-mode cluster --master yarn --py-files
parallelisation_hack-0.1-py2.7.egg --conf spark.pyspark.virtualenv.enabled=true
--conf spark.pyspark.virtualenv.type=native --conf
spark.pyspark.virtualenv.requirements=requirements.txt --conf
spark.pyspark.virtualenv.bin.path=virtualenv --conf
spark.pyspark.python=python3 pyspark_poc_runner.py{code}
was (Author: mattmould):
What is the current status of this ticket please? This
[article|https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html]
suggests that it's done, but the it doesn't work for me with the following
command.
{code:java}
spark-submit --deploy-mode cluster --master yarn --py-files
parallelisation_hack-0.1-py2.7.egg --conf spark.pyspark.virtualenv.enabled=true
--conf spark.pyspark.virtualenv.type=native --conf
spark.pyspark.virtualenv.requirements=requirements.txt --conf
spark.pyspark.virtualenv.bin.path=virtualenv --conf
spark.pyspark.python=python3 pyspark_poc_runner.py{code}
> Support virtualenv in PySpark
> -----------------------------
>
> Key: SPARK-13587
> URL: https://issues.apache.org/jira/browse/SPARK-13587
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Priority: Major
>
> Currently, it's not easy for user to add third party python packages in
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native
> virtualenv another is through conda. This jira is trying to migrate these 2
> tools to distributed environment
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]