Hello, I am trying to use Toree to develop in Scala with a Jupyter notebook. Our setup is Amazon EMR, Spark 2.2.0. My goal is to connect the Toree kernel to the Spark components already coming with EMR and to setup master = yarn-client.
I have installed Jupyterhub and Toree, was able to access Jupyter and create a new notebook using the Toree kernel but the kernel crashes. Below are the steps I have taken to setup everything and extract of the error message thrown by Jupyter / Toree. My guess is that Spark 2.2.0 is not supported with Toree and that there is a Scala version mismatch. Can you confirm / let me know if there is a way to use Toree with Spark 2.2.0 on EMR? Let me know if you have any questions. Many thanks, Raphael sudo su - # Python 3.4 and pip3 curl -O https://bootstrap.pypa.io/get-pip.py /usr/bin/python3.4 get-pip.py # Nodejs, npm and http-proxy yum install nodejs npm --enablerepo=epel npm install -g configurable-http-proxy # Jupyter pip3 install notebook pip3 install jupyterhub # Spark / scala kernel mv /usr/bin/java /usr/bin/java.ori ln -s /usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/java /usr/bin/java wget http://downloads.typesafe.com/scala/2.12.4/scala-2.12.4.rpm rpm -ivh scala-2.12.4.rpm scala -version rm -f scala-2.12.4.rpm # environment variables vi /etc/profile insert ===>>> ##################################################################### export SPARK_HOME="/usr/lib/spark" echo "SPARK_HOME=$SPARK_HOME" export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk.x86_64" echo "JAVA_HOME=$JAVA_HOME" export SCALA_HOME="/usr/share/scala" echo "SCALA_HOME=$SCALA_HOME" export YARN_CONF_DIR="/etc/hadoop/conf" echo "YARN_CONF_DIR=$YARN_CONF_DIR" export HADOOP_CONF_DIR="/etc/hadoop/conf" echo "HADOOP_CONF_DIR=$HADOOP_CONF_DIR" <<<=== source /etc/profile # install Toree pip3 install toree jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,SQL --spark_opts='--master yarn' --python=/usr/bin/python3 # Start jupyterhub logout jupyterhub ===> Can access Jupyter, open new Toree notebook but kernel fails with this error: (odd thing is scala version reported is 2.10.4, not installed 2.12.4) 18/02/13 05:36:53 INFO Main$$anon$1: Kernel version: 0.1.0-incubating 18/02/13 05:36:53 INFO Main$$anon$1: Scala version: Some(2.10.4) 18/02/13 05:36:53 INFO Main$$anon$1: ZeroMQ (JeroMQ) version: 3.2.5 18/02/13 05:36:53 INFO Main$$anon$1: Initializing internal actor system Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet; at akka.actor.ActorCell$.<init>(ActorCell.scala:336) at akka.actor.ActorCell$.<clinit>(ActorCell.scala)