[jira] [Created] (ZEPPELIN-1265) Value of zeppelin.pyspark.python not reflected the python version Spark executors use on YARN

Sami Jaktholm (JIRA) Mon, 01 Aug 2016 23:15:07 -0700

Sami Jaktholm created ZEPPELIN-1265:
---------------------------------------


             Summary: Value of zeppelin.pyspark.python not reflected the python 
version Spark executors use on YARN
                 Key: ZEPPELIN-1265
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1265
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.5.6
            Reporter: Sami Jaktholm


STR:
0. Have both python2 and python3 installed
1. Set {{zeppelin.pyspark.python}} to {{python3}} (or where your Python 3 is 
installed)
2. Run some pyspark code that involves executing tasks on executor nodes

What happens:
bq. Exception: Python in worker has different version 3.4 than that in driver 
2.7, PySpark cannot run with different minor versions

What should happen: The code runs without exceptions as the value of 
{{zeppelin.pyspark.python}} is correctly propagated to the Spark executors. IMO 
the correct behavior is to use the the value {{zeppelin.pyspark.python}} as the 
Python interpreter with Spark and set {{PYSPARK_PYTHON}} to that same value so 
that Spark can pick it up and ship it to executors.

The problem here is that when {{zeppelin.pyspark.python}} is set to use Python 
3, Zeppelin starts the Spark master process with python3. However, this 
configuration is not reflected on the executors and they use what they can find 
in {{PYSPARK_PYTHON}} envvar, which defaults to Python 2. So changing python 
version also requires setting {{PYSPARK_PYTHON}} to correct value which it not 
easy thing to do on the fly (I guess you need to change a config file somewhere 
and restart Zeppelin to achieve that).

This might only be an issue when running Spark on a YARN cluster with multiple 
machines (as in the selection of python version is not propagated to the 
executor machines) since I haven't tested this in a single machine scenario. 
Also, I haven't been able to test this in Zeppelin 0.6.0 yet so this could 
already be fixed but I didn't find any similar tickets that were resolved after 
Zeppelin 0.5.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZEPPELIN-1265) Value of zeppelin.pyspark.python not reflected the python version Spark executors use on YARN

Reply via email to