[jira] [Created] (LIVY-859) Start PySpark application Failed with Python 3.7.6

Eric Luo (Jira) Thu, 20 May 2021 19:29:07 -0700

Eric Luo created LIVY-859:
-----------------------------

             Summary: Start PySpark application Failed with Python 3.7.6
                 Key: LIVY-859
                 URL: https://issues.apache.org/jira/browse/LIVY-859
             Project: Livy
          Issue Type: Question
         Environment: CDH: 6.3.2
Spark: 2.4.0
Python: 3.7.6
Livy: 0.7.0-incubating
            Reporter: Eric Luo



I have two CDH clusters (stage and prod) with the same environment, the same 
Livy service installed on them, a Python 3.7 environment configured via 
Ansible, and I write PySpark by calling Livy via sparkmagic in jupyter-lab. 
code in jupyter-lab, the stage environment works fine, but the prod environment 
gives an error.

Error log:

 
{code:java}
21/05/20 14:43:45 INFO driver.SparkEntries: Created Spark session (with Hive 
support).
21/05/20 14:43:50 ERROR repl.PythonInterpreter: Process has died with 134
21/05/20 14:43:50 ERROR repl.PythonInterpreter: Fatal Python error: 
initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'{code}




 


On a prod environment machine, I actually have PYSPARK_PYTHON 
(/opt/miniconda/bin/python) configured to ``import encodings``, and I can run 
pyspark SHELL directly without any problems.

My environment configuration.

CDH: 6.3.2
Spark: 2.4.0
Python: 3.7.6

In CDH's "Spark Service Advanced Configuration Snippet (Safety Valve) for 
spark-conf/spark-env.sh" configuration, I configured PYSPARK_PYTHON and 
PYSPARK_ DRIVER_PYTHON

```
export PYSPARK_PYTHON=${PYSPARK_PYTHON:-/opt/miniconda/bin/python}
export PYSPARK_DRIVER_PYTHON=${PYSPARK_DRIVER_PYTHON:-/opt/miniconda/bin/python}
```

livy-env.sh also configures PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

```
PYSPARK_PYTHON=/opt/miniconda/bin/python
PYSPARK_DRIVER_PYTHON=/opt/miniconda/bin/python

JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera/
HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (LIVY-859) Start PySpark application Failed with Python 3.7.6

Reply via email to