[jupyter] Re: How to get Spark2.3 working in Jupyter Notebook?

Kevin Bates Tue, 16 Oct 2018 07:53:25 -0700

You might try enabling tracing (probably via log4j) since you can't yet 
enable it via the context.


As a point of reference, it might be helpful to check out the sample 
kernelspecs 
<https://github.com/jupyter/enterprise_gateway/tree/master/etc/kernelspecs> 
provided by Enterprise Gateway, although these are based on HDP rather than 
Cloudera.  Some primary differences are that EG embeds the kernels in 
wrappers that handle lifecycle and spark context creation and introduce the 
ability to distribute kernels, but those are not required.  Use of the 
run.sh approach, however, might provide you with better troubleshooting 
capabilities.

On Monday, October 15, 2018 at 8:43:59 AM UTC-7, Pasle Choix wrote:
>
> I am struggling on getting Spark2.3 working in Jupyter Notebook now.
>
> Currently I have kernel created as below:
>
> 1. create an environment file:
>
> ~]$ cat rxie20181012-pyspark.yml
>
> name: rxie20181012-pyspark
>
> dependencies:
>
> - pyspark
>
> 2. create an environment based on the environment file
>
> conda env create -f rxie20181012-pyspark.yml
>
> 3. activate the new environment:
>
> source activate rxie20181012-pyspark
>
> 4. create kernel based on the conda env:
>
> sudo ./python -m ipykernel install --name rxie20181012-pyspark 
> --display-name "Python (rxie20181012-pyspark)"
>
> 5. kernel.json is as below:
>
> cat /usr/local/share/jupyter/kernels/rxie20181012-pyspark/kernel.json
>
> {
>
>  "display_name": "Python (rxie20181012-pyspark)",
>
>  "language": "python",
>
>  "argv": [
>
>   "/opt/cloudera/parcels/Anaconda-4.2.0/bin/python",
>
>   "-m",
>
>   "ipykernel",
>
>   "-f",
>
>   "{connection_file}"
>
>  ]
> }
>
>
> 6. After noticing the notebook failed on import pyspark, I added env 
> section as below to the kernel.json:
>
> {
>
>  "display_name": "Python (rxie20181012-pyspark)",
>
>  "language": "python",
>
>  "argv": [
>
>   "/opt/cloudera/parcels/Anaconda-4.2.0/bin/python",
>
>   "-m",
>
>   "ipykernel",
>
>   "-f",
>
>   "{connection_file}"
>
>  ],
>
>  "env": {
>
>   "HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
>
>   "PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python",
>
>   "SPARK_HOME": "/opt/cloudera/parcels/SPARK2",
>
>   "PYTHONPATH": 
> "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
>
>   "PYTHONSTARTUP": 
> "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
>
>   "PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client 
> pyspark-shell"
>
>  }
>
> }
>
>
> Now no more error on import pyspark, but still not able to start a 
> sparksession:
>
> import pyspark
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('abc').getOrCreate()
>
> OSErrorTraceback (most recent call last)<ipython-input-2-f2a61cc0323d> in 
> <module>()----> 1 spark = SparkSession.builder.appName('abc').getOrCreate()
> /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/session.pyc in 
> getOrCreate(self)    171                     for key, value in 
> self._options.items():    172                         sparkConf.set(key, 
> value)--> 173                     sc = SparkContext.getOrCreate(sparkConf)    
> 174                     # This SparkContext may be an existing one.    175    
>                  for key, value in self._options.items():
> /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in 
> getOrCreate(cls, conf)    341         with SparkContext._lock:    342         
>     if SparkContext._active_spark_context is None:--> 343                 
> SparkContext(conf=conf or SparkConf())    344             return 
> SparkContext._active_spark_context    345 
> /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in 
> __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, 
> serializer, conf, gateway, jsc, profiler_cls)    113         """    114       
>   self._callsite = first_spark_call() or CallSite(None, None, None)--> 115    
>      SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)    
> 116         try:    117             self._do_init(master, appName, sparkHome, 
> pyFiles, environment, batchSize, serializer,
> /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/context.pyc in 
> _ensure_initialized(cls, instance, gateway, conf)    290         with 
> SparkContext._lock:    291             if not SparkContext._gateway:--> 292   
>               SparkContext._gateway = gateway or launch_gateway(conf)    293  
>                SparkContext._jvm = SparkContext._gateway.jvm    294 
> /opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/java_gateway.pyc in 
> launch_gateway(conf)     81                 def preexec_func():     82        
>              signal.signal(signal.SIGINT, signal.SIG_IGN)---> 83              
>    proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)     84 
>             else:     85                 # preexec_fn not supported on Windows
> /opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in __init__(self, 
> args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, 
> shell, cwd, env, universal_newlines, startupinfo, creationflags)    709       
>                           p2cread, p2cwrite,    710                           
>       c2pread, c2pwrite,--> 711                                 errread, 
> errwrite)    712         except Exception:    713             # Preserve 
> original exception in case os.close raises.
> /opt/cloudera/parcels/Anaconda/lib/python2.7/subprocess.pyc in 
> _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, 
> universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, 
> p2cwrite, c2pread, c2pwrite, errread, errwrite)   1341                        
>  raise   1342                 child_exception = pickle.loads(data)-> 1343     
>             raise child_exception   1344    1345 
> OSError: [Errno 2] No such file or directory
>
>
>
> Can anyone help me to sort it out please? Thank you from bottom of my heart.
>
>
> Pasle
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/ee8dc178-51d6-41b4-91ab-fe9be956a633%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[jupyter] Re: How to get Spark2.3 working in Jupyter Notebook?

Reply via email to