I am struggling on getting Spark2.3 working in Jupyter Notebook now.
Currently I have kernel created as below:
1. create an environment file:
~]$ cat rxie20181012-pyspark.yml
name: rxie20181012-pyspark
dependencies:
- pyspark
2. create an environment based on the environment file
conda env create -f rxie20181012-pyspark.yml
3. activate the new environment:
source activate rxie20181012-pyspark
4. create kernel based on the conda env:
sudo ./python -m ipykernel install --name rxie20181012-pyspark
--display-name "Python (rxie20181012-pyspark)"
5. kernel.json is as below:
cat /usr/local/share/jupyter/kernels/rxie20181012-pyspark/kernel.json
{
"display_name": "Python (rxie20181012-pyspark)",
"language": "python",
"argv": [
"/opt/cloudera/parcels/Anaconda-4.2.0/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
]
}
6. After noticing the notebook failed on import pyspark, I added env
section as below to the kernel.json:
{
"display_name": "Python (rxie20181012-pyspark)",
"language": "python",
"argv": [
"/opt/cloudera/parcels/Anaconda-4.2.0/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
"PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python",
"SPARK_HOME": "/opt/cloudera/parcels/SPARK2",
"PYTHONPATH":
"/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
"PYTHONSTARTUP":
"/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
}
}
Now no more error on import pyspark, but still not able to start a
sparksession:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
OSErrorTraceback (most recent call last)<ipython-input-2-f2a61cc0323d> in
<module>()----> 1 spark = SparkSession.builder.appName('abc').getOrCreate()
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/session.pyc in
getOrCreate(self) 171 for key, value in
self._options.items(): 172 sparkConf.set(key,
value)--> 173 sc = SparkContext.getOrCreate(sparkConf)
174 # This SparkContext may be an existing one. 175
for key, value in self._options.items():
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in
getOrCreate(cls, conf) 341 with SparkContext._lock: 342
if SparkContext._active_spark_context is None:--> 343
SparkContext(conf=conf or SparkConf()) 344 return
SparkContext._active_spark_context 345
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in
__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize,
serializer, conf, gateway, jsc, profiler_cls) 113 """ 114
self._callsite = first_spark_call() or CallSite(None, None, None)--> 115
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) 116
try: 117 self._do_init(master, appName, sparkHome, pyFiles,
environment, batchSize, serializer,
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in
_ensure_initialized(cls, instance, gateway, conf) 290 with
SparkContext._lock: 291 if not SparkContext._gateway:--> 292
SparkContext._gateway = gateway or launch_gateway(conf) 293
SparkContext._jvm = SparkContext._gateway.jvm 294
/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/java_gateway.pyc in
launch_gateway(conf) 81 def preexec_func(): 82
signal.signal(signal.SIGINT, signal.SIG_IGN)---> 83
proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env) 84
else: 85 # preexec_fn not supported on Windows
/opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in __init__(self,
args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell,
cwd, env, universal_newlines, startupinfo, creationflags) 709
p2cread, p2cwrite, 710
c2pread, c2pwrite,--> 711 errread, errwrite)
712 except Exception: 713 # Preserve original exception
in case os.close raises.
/opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in
_execute_child(self, args, executable, preexec_fn, close_fds, cwd, env,
universal_newlines, startupinfo, creationflags, shell, to_close, p2cread,
p2cwrite, c2pread, c2pwrite, errread, errwrite) 1341
raise 1342 child_exception = pickle.loads(data)-> 1343
raise child_exception 1344 1345
OSError: [Errno 2] No such file or directory
Can anyone help me to sort it out please? Thank you from bottom of my heart.
Pasle
--
You received this message because you are subscribed to the Google Groups
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jupyter/3c83c1e1-e2b8-4700-82b8-064792b89d36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.