[
https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614885#comment-14614885
]
Sean Owen commented on SPARK-8646:
----------------------------------
Right, none of this uses pandas directly. As [~vanzin] says the code appears to
be careful about only calling "import pandas" when needed {{toPandas()}} or
catching for the error when it's not available. My guess is that {{has_pandas}}
is true on the driver but then that causes it to do things that the executors
can't honor since they don't have pandas.
It does sound like a docs issue. Some Pyspark operations need pandas and you
need a uniform Python installation across driver and executor -- either both
have it or both don't. I suppose that's always good practice, but not obvious,
that it could manifest like this.
How about adding some docs?
Or [~davies] et al is there a better way to guard this? rather than check once
whether pandas can be imported, check at "runtime" in the createDataFrame
method? kind of like {{toPandas}} does?
> PySpark does not run on YARN
> ----------------------------
>
> Key: SPARK-8646
> URL: https://issues.apache.org/jira/browse/SPARK-8646
> Project: Spark
> Issue Type: Bug
> Components: PySpark, YARN
> Affects Versions: 1.4.0
> Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py
> hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in
> my own code.
> Reporter: Juliet Hougland
> Attachments: pi-test.log, spark1.4-SPARK_HOME-set-PYTHONPATH-set.log,
> spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log,
> spark1.4-SPARK_HOME-set.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in
> yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.|
> https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that
> worked on previous spark versions (ie comands the use spark-submit) should
> continue to work without modification between minor versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]