For a bit more context, this is the general way of starting Jupyter with PySpark support. In contrast, the usual `jupyter notebook` command will only launch Jupyter with a standard Python kernel.
Additionally, all of the extra "conf" settings in that command refer to settings that could be placed in the standard `conf/spark-defaults.conf`file of your Spark installation, with spaces instead of the equals signs, in case you're already familiar with that. - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jul 5, 2017, at 2:14 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Hi Gustavo, > > You can paste that code into the commandline: > $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark > --master local[*] --conf "spark.driver.memory=12g" --conf > spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf > spark.default.parallelism=100 > > The above command tells "pyspark" that the python driver is jupyter. For more > details, please see > https://github.com/apache/spark/blob/master/bin/pyspark#L27 > > Alternatively, you can follow Arijit's suggestion. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > arijit chakraborty ---07/02/2017 04:22:28 AM---Hi Gustavo, You can put that > pyspark details in the jupyter console itself. > > From: arijit chakraborty <ak...@hotmail.com> > To: "dev@systemml.apache.org" <dev@systemml.apache.org> > Date: 07/02/2017 04:22 AM > Subject: Re: Install - Configure Jupyter Notebook > > > > > Hi Gustavo, > > > You can put that pyspark details in the jupyter console itself. > > > import os > import sys > import pandas as pd > import numpy as np > > spark_path = "C:\spark" > os.environ['SPARK_HOME'] = spark_path > os.environ['HADOOP_HOME'] = spark_path > > sys.path.append(spark_path + "/bin") > sys.path.append(spark_path + "/python") > sys.path.append(spark_path + "/python/pyspark/") > sys.path.append(spark_path + "/python/lib") > sys.path.append(spark_path + "/python/lib/pyspark.zip") > sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip") > > from pyspark import SparkContext > from pyspark import SparkConf > > sc = SparkContext("local[*]", "test") > > > # SystemML Specifications: > > > from pyspark.sql import SQLContext > import systemml as sml > sqlCtx = SQLContext(sc) > ml = sml.MLContext(sc) > > > But this is not a very good way of doing it. I did it as I'm using windows > and it's easier to do it like that. > > > Regards, > > Arijit > > ________________________________ > From: Gustavo Frederico <gustavo.freder...@thinkwrap.com> > Sent: Sunday, July 2, 2017 10:16:03 AM > To: dev@systemml.apache.org > Subject: Install - Configure Jupyter Notebook > > > A basic question: step 3 in https://systemml.apache.org/install-systemml.html > <https://systemml.apache.org/install-systemml.html> for “Configure Jupyter > Notebook” has > # Start Jupyter Notebook Server > PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark > --master local[*] --conf "spark.driver.memory=12g" --conf > spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf > spark.default.parallelism=100 > Where does that go? There are no details in this step… > > Thanks > > Gustavo > > >