You might try enabling tracing (probably via log4j) since you can't yet enable it via the context.
As a point of reference, it might be helpful to check out the sample kernelspecs <https://github.com/jupyter/enterprise_gateway/tree/master/etc/kernelspecs> provided by Enterprise Gateway, although these are based on HDP rather than Cloudera. Some primary differences are that EG embeds the kernels in wrappers that handle lifecycle and spark context creation and introduce the ability to distribute kernels, but those are not required. Use of the run.sh approach, however, might provide you with better troubleshooting capabilities. On Monday, October 15, 2018 at 8:43:59 AM UTC-7, Pasle Choix wrote: > > I am struggling on getting Spark2.3 working in Jupyter Notebook now. > > Currently I have kernel created as below: > > 1. create an environment file: > > ~]$ cat rxie20181012-pyspark.yml > > name: rxie20181012-pyspark > > dependencies: > > - pyspark > > 2. create an environment based on the environment file > > conda env create -f rxie20181012-pyspark.yml > > 3. activate the new environment: > > source activate rxie20181012-pyspark > > 4. create kernel based on the conda env: > > sudo ./python -m ipykernel install --name rxie20181012-pyspark > --display-name "Python (rxie20181012-pyspark)" > > 5. kernel.json is as below: > > cat /usr/local/share/jupyter/kernels/rxie20181012-pyspark/kernel.json > > { > > "display_name": "Python (rxie20181012-pyspark)", > > "language": "python", > > "argv": [ > > "/opt/cloudera/parcels/Anaconda-4.2.0/bin/python", > > "-m", > > "ipykernel", > > "-f", > > "{connection_file}" > > ] > } > > > 6. After noticing the notebook failed on import pyspark, I added env > section as below to the kernel.json: > > { > > "display_name": "Python (rxie20181012-pyspark)", > > "language": "python", > > "argv": [ > > "/opt/cloudera/parcels/Anaconda-4.2.0/bin/python", > > "-m", > > "ipykernel", > > "-f", > > "{connection_file}" > > ], > > "env": { > > "HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf", > > "PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python", > > "SPARK_HOME": "/opt/cloudera/parcels/SPARK2", > > "PYTHONPATH": > "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/", > > "PYTHONSTARTUP": > "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py", > > "PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client > pyspark-shell" > > } > > } > > > Now no more error on import pyspark, but still not able to start a > sparksession: > > import pyspark > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('abc').getOrCreate() > > OSErrorTraceback (most recent call last)<ipython-input-2-f2a61cc0323d> in > <module>()----> 1 spark = SparkSession.builder.appName('abc').getOrCreate() > /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/session.pyc in > getOrCreate(self) 171 for key, value in > self._options.items(): 172 sparkConf.set(key, > value)--> 173 sc = SparkContext.getOrCreate(sparkConf) > 174 # This SparkContext may be an existing one. 175 > for key, value in self._options.items(): > /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in > getOrCreate(cls, conf) 341 with SparkContext._lock: 342 > if SparkContext._active_spark_context is None:--> 343 > SparkContext(conf=conf or SparkConf()) 344 return > SparkContext._active_spark_context 345 > /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in > __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, > serializer, conf, gateway, jsc, profiler_cls) 113 """ 114 > self._callsite = first_spark_call() or CallSite(None, None, None)--> 115 > SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) > 116 try: 117 self._do_init(master, appName, sparkHome, > pyFiles, environment, batchSize, serializer, > /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in > _ensure_initialized(cls, instance, gateway, conf) 290 with > SparkContext._lock: 291 if not SparkContext._gateway:--> 292 > SparkContext._gateway = gateway or launch_gateway(conf) 293 > SparkContext._jvm = SparkContext._gateway.jvm 294 > /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/java_gateway.pyc in > launch_gateway(conf) 81 def preexec_func(): 82 > signal.signal(signal.SIGINT, signal.SIG_IGN)---> 83 > proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env) 84 > else: 85 # preexec_fn not supported on Windows > /opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in __init__(self, > args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, > shell, cwd, env, universal_newlines, startupinfo, creationflags) 709 > p2cread, p2cwrite, 710 > c2pread, c2pwrite,--> 711 errread, > errwrite) 712 except Exception: 713 # Preserve > original exception in case os.close raises. > /opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in > _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, > universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, > p2cwrite, c2pread, c2pwrite, errread, errwrite) 1341 > raise 1342 child_exception = pickle.loads(data)-> 1343 > raise child_exception 1344 1345 > OSError: [Errno 2] No such file or directory > > > > Can anyone help me to sort it out please? Thank you from bottom of my heart. > > > Pasle > > -- You received this message because you are subscribed to the Google Groups "Project Jupyter" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/ee8dc178-51d6-41b4-91ab-fe9be956a633%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
