[jupyter] Scala Kernel Discussion

Kyle Kelley Fri, 03 Mar 2017 17:15:06 -0800

On February 27, 2017 a group of us met to talk about Scala kernels and pave
a path forward for Scala users. There is a youtube video available of the
discussion available here:

https://www.youtube.com/watch?v=0NRONVuct0E

What follows is a summary from the call, mostly in linear order from the
video itself.
Attendees

Alexander Archambault - Jupyter Scala, Ammonium
-

Ryan Blue (Netflix) - Toree
-

Gino Bustelo (IBM) - Toree
-

Joy Chakraborty (Bloomberg) - Spark Magic with Livy
-

Kyle Kelley (Netflix) - Jupyter
-

Haley Most (Cloudera) - Toree
-

Marius van Niekerk (Maxpoint) - Toree, Spylon
-

Peter Parente (Maxpoint) - Jupyter
-

Corey Stubbs (IBM) - Toree
-

Jamie Whitacre (Berkeley) - Jupyter
-

Tristan Zajonc (Cloudera) - Toree, Livy

Each of the people on the call has a preferred kernel, way of building it,
and integrating it. We have a significant user experience problem in terms
of users installing and using Scala kernels, beyond just Spark usage. The
overarching goal is to create a cohesive experience for Scala users when
they use Jupyter.

When a Scala user tries to come to the Jupyter ecosystem (or even a
familiar Python developer), they face many options for kernels. Being faced
with choice when trying to get things done is creating new friction points
for users. As examples see
https://twitter.com/chrisalbon/status/833156959150841856 and
https://twitter.com/sarah_guido/status/833165030296322049.
What are our foundations for REPL libraries in Scala?

Toree was built on top of the Spark REPL and developers tried to use as
much code as possible from Spark. For Alex’s jupyter-scala, he recognized
that the Spark REPL was changing a lot from version to version. At the same
time, Ammonite <https://github.com/lihaoyi/Ammonite> was created to assist
in Scala scripting. In order to make big data frameworks such as Spark,
Flink, and Scio to work well in this environment, a fork called Ammonium
<https://github.com/alexarchambault/ammonium> was created. There is some
amount of trepidation in using a separate fork as part of the kernel
community. We should make sure to unify with the originating Ammonite and
contribute back as part of a larger scala community that can maintain these
together.
Action Items:

Renew focus on Scala within Toree, improve outward messaging about how
Toree provides a scala kernel
-

Unify Ammonite and Ammonium ([email protected])
-

To be used in jupyter-scala, potentially for spylon

There is more than one implementation of the Jupyter protocol in the Java
Stack.

Toree has one, jupyter-scala does one, clojure kernels have their own.
People would like to see a stable Jupyter library for the JVM. Some think
it’s better to have one per language. Regardless of choice, we should have
a well supported Jupyter library.
Action Items:

Create an idiomatic Java Library for the Jupyter messaging protocol -
propose this as an incubation project within Jupyter

Decouple Spark from Scala in kernels

Decouple language specific parts from the computing framework to allow for
using other computing frameworks. This is paramount for R and Python. When
we inevitably want to connect to a GPU cluster, we want to be able to use
the same foundations of a kernel. The reason that these end up being
coupled is that Spark does “slightly weird things” for how it wants its
classes compiled. It’s thought that there is some amount of specialization
and that we can work around it. At the very least, we can bake it into the
core and leave room for other frameworks to have solid built in support if
necessary.

An approach being worked on in Toree right now is lazy loading of spark.
One concern that is different between jupyter-scala and Toree is that
jupyter-scala can dynamically load spark versions whereas for Toree is
bound to a version of Spark on deployment. For end users that have
operators/admins, kernels can be configured per version of spark it will
use (common for Python, R). Spark drives lots of interest in Scala kernel,
many kernels conflate the two. This results in poor messaging and
experiences for users getting started.
Action Items:

Lazy load spark within Toree

Focus efforts within kernel communities

Larger in scope than just the Scala kernel, we need jupyter to acknowledge
fully supported kernels. In contrast, the whole community in Zeppelin
collaborates in one repository around their interpreters.

“Fragmentation of kernels makes it harder for large enterprises to adopt
them.”

- Tristan Zajonc (Cloudera)

Beyond the technical implementation of what is a supported kernel, we also
need the messaging to end users to be simple and clear. There are several
objectives we need to do to improve our messaging, organization, and
technical underpinnings.
Action Items

On the Jupyter site provide blurbs and links to kernels for R, Python,
and Scala
-

Create an organized effort around the Scala Kernel, possibly by unifying
in an organization while isolating projects in separate repositories
-

Align a specification of what it takes to be acknowledged as a supported
kernel

Visualization

We would like to be able to push on the idea of mimetypes that output a
hunk of JSON and are able to draw beautiful visualizations. Having these
adopted in core Jupyter by default would go a long way towards providing
simple just works visualization. The current landscape of visualization
with the Scala kernels includes

Vegas <https://github.com/vegas-viz/Vegas>
-

Plotly Scala <https://github.com/alexarchambault/plotly-scala>
-

Brunel <https://github.com/Brunel-Visualization/Brunel>
-

Data Resource / Table Schema (see
https://github.com/pandas-dev/pandas/pull/14904)

There is a bit of worry about standardization around the HTML outputs. Some
libraries try to use frontend libraries that may not exist on the frontend
or mismatch in version - jquery, requirejs, ipywidgets, jupyter, ipython.
In some frontends, at times dictated by the operating environment, the HTML
outputs must be in null origin iframes.
Action Items

Continue involvement in Jupyter frontends to provide rich visualization
out of the box with less configuration and less friction

Standardizing display and reprs for Scala

Since it’s likely that we there will still be multiple kernels available
for the JVM, not just within Scala, we want to standardize the way in which
you inspect objects in the JVM. IPython provides a way for libraries to
integrate with IPython automatically for users. We want library developers
to be able to follow a common scheme and be well represented regardless of
the kernel.
Action Items:

- Create a specification for object representation for JVM languages as
part of the Jupyter project

--
Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com)

--
You received this message because you are subscribed to the Google Groups
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[jupyter] Scala Kernel Discussion

Reply via email to