I have installed toree to my jupyter environment
(https://github.com/apache/incubator-toree) and written a piece of code that
works with pyspark. Yarn starts properly and I can see the containers running
in the queue,
When I run the code, I get the following error
Error from python worker: /usr/local/bin/python2.7: No module named pyspark
the kernel is set-up as follows:
{ "language": "python", "display_name": "Apache Toree - PySpark", "env": {
"__TOREE_SPARK_OPTS__": " --master yarn", "SPARK_HOME":
"/usr/hdp/2.4.2.0-258/spark", "__TOREE_OPTS__": "",
"DEFAULT_INTERPRETER": "PySpark", "PYTHONPATH":
"/usr/hdp/2.4.2.0-258/spark/python:/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip",
"PYTHON_EXEC": "python", "PYTHONSTARTUP":
"/usr/hdp/2.4.2.0-258/spark/python/pyspark/shell.py", "PYSPARK_PYTHON":
"/usr/local/bin/python2.7", "PYSPARK_DRIVER_PYTHON":
"/usr/local/bin/python2.7"
}, "argv": [
"/usr/local/share/jupyter/kernels/apache_toree_pyspark/bin/run.sh",
"--profile", "{connection_file}" ]}