Hello,

I am trying to use Toree to develop in Scala with a Jupyter notebook.
Our setup is Amazon EMR, Spark 2.2.0. My goal is to connect the Toree kernel to 
the Spark components already coming with EMR and to setup master = yarn-client.

I have installed Jupyterhub and Toree, was able to access Jupyter and create a 
new notebook using the Toree kernel but the kernel crashes.
Below are the steps I have taken to setup everything and extract of the error 
message thrown by Jupyter  / Toree.

My guess is that Spark 2.2.0 is not supported with Toree and that there is a 
Scala version mismatch.

Can you confirm / let me know if there is a way to use Toree with Spark 2.2.0 
on EMR?

Let me know if you have any questions.

Many thanks,
Raphael


sudo su -

# Python 3.4 and pip3
curl -O https://bootstrap.pypa.io/get-pip.py
/usr/bin/python3.4 get-pip.py


# Nodejs, npm and http-proxy
yum install nodejs npm --enablerepo=epel
npm install -g configurable-http-proxy

# Jupyter
pip3 install notebook
pip3 install jupyterhub


# Spark / scala kernel
mv /usr/bin/java /usr/bin/java.ori
ln -s /usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/java /usr/bin/java

wget http://downloads.typesafe.com/scala/2.12.4/scala-2.12.4.rpm
rpm -ivh scala-2.12.4.rpm
scala -version
rm -f scala-2.12.4.rpm

# environment variables
vi /etc/profile
insert ===>>>
#####################################################################
export SPARK_HOME="/usr/lib/spark"
echo "SPARK_HOME=$SPARK_HOME"

export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk.x86_64"
echo "JAVA_HOME=$JAVA_HOME"

export SCALA_HOME="/usr/share/scala"
echo "SCALA_HOME=$SCALA_HOME"

export YARN_CONF_DIR="/etc/hadoop/conf"
echo "YARN_CONF_DIR=$YARN_CONF_DIR"

export HADOOP_CONF_DIR="/etc/hadoop/conf"
echo "HADOOP_CONF_DIR=$HADOOP_CONF_DIR"
<<<===
source /etc/profile

# install Toree
pip3 install toree
jupyter toree install --spark_home=$SPARK_HOME --interpreters=Scala,SQL 
--spark_opts='--master yarn' --python=/usr/bin/python3

# Start jupyterhub
logout
jupyterhub


===> Can access Jupyter, open new Toree notebook but kernel fails with this 
error:
(odd thing is scala version reported is 2.10.4, not installed 2.12.4)

18/02/13 05:36:53 INFO Main$$anon$1: Kernel version: 0.1.0-incubating
18/02/13 05:36:53 INFO Main$$anon$1: Scala version: Some(2.10.4)
18/02/13 05:36:53 INFO Main$$anon$1: ZeroMQ (JeroMQ) version: 3.2.5
18/02/13 05:36:53 INFO Main$$anon$1: Initializing internal actor system
Exception in thread "main" java.lang.NoSuchMethodError: 
scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
      at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
      at akka.actor.ActorCell$.<clinit>(ActorCell.scala)

Reply via email to