Sami Jaktholm created ZEPPELIN-1265:
---------------------------------------
Summary: Value of zeppelin.pyspark.python not reflected the python
version Spark executors use on YARN
Key: ZEPPELIN-1265
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1265
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.5.6
Reporter: Sami Jaktholm
STR:
0. Have both python2 and python3 installed
1. Set {{zeppelin.pyspark.python}} to {{python3}} (or where your Python 3 is
installed)
2. Run some pyspark code that involves executing tasks on executor nodes
What happens:
bq. Exception: Python in worker has different version 3.4 than that in driver
2.7, PySpark cannot run with different minor versions
What should happen: The code runs without exceptions as the value of
{{zeppelin.pyspark.python}} is correctly propagated to the Spark executors. IMO
the correct behavior is to use the the value {{zeppelin.pyspark.python}} as the
Python interpreter with Spark and set {{PYSPARK_PYTHON}} to that same value so
that Spark can pick it up and ship it to executors.
The problem here is that when {{zeppelin.pyspark.python}} is set to use Python
3, Zeppelin starts the Spark master process with python3. However, this
configuration is not reflected on the executors and they use what they can find
in {{PYSPARK_PYTHON}} envvar, which defaults to Python 2. So changing python
version also requires setting {{PYSPARK_PYTHON}} to correct value which it not
easy thing to do on the fly (I guess you need to change a config file somewhere
and restart Zeppelin to achieve that).
This might only be an issue when running Spark on a YARN cluster with multiple
machines (as in the selection of python version is not propagated to the
executor machines) since I haven't tested this in a single machine scenario.
Also, I haven't been able to test this in Zeppelin 0.6.0 yet so this could
already be fixed but I didn't find any similar tickets that were resolved after
Zeppelin 0.5.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)