Re: [jupyter] Scala Kernel Discussion

MinRK Mon, 06 Mar 2017 08:58:58 -0800

This is awesome, thanks Kyle (and everyone)!

On Fri, Mar 3, 2017 at 5:14 PM, Kyle Kelley <[email protected]> wrote:


> On February 27, 2017 a group of us met to talk about Scala kernels and
> pave a path forward for Scala users. There is a youtube video available of
> the discussion available here:
>
> https://www.youtube.com/watch?v=0NRONVuct0E
>
> What follows is a summary from the call, mostly in linear order from the
> video itself.
> Attendees
>
>    -
>
>    Alexander Archambault - Jupyter Scala, Ammonium
>    -
>
>    Ryan Blue (Netflix) - Toree
>    -
>
>    Gino Bustelo (IBM) - Toree
>    -
>
>    Joy Chakraborty (Bloomberg) - Spark Magic with Livy
>    -
>
>    Kyle Kelley (Netflix) - Jupyter
>    -
>
>    Haley Most (Cloudera) - Toree
>    -
>
>    Marius van Niekerk (Maxpoint) - Toree, Spylon
>    -
>
>    Peter Parente (Maxpoint) - Jupyter
>    -
>
>    Corey Stubbs (IBM) - Toree
>    -
>
>    Jamie Whitacre (Berkeley) - Jupyter
>    -
>
>    Tristan Zajonc (Cloudera) - Toree, Livy
>
>
> Each of the people on the call has a preferred kernel, way of building it,
> and integrating it. We have a significant user experience problem in terms
> of users installing and using Scala kernels, beyond just Spark usage. The
> overarching goal is to create a cohesive experience for Scala users when
> they use Jupyter.
>
> When a Scala user tries to come to the Jupyter ecosystem (or even a
> familiar Python developer), they face many options for kernels. Being faced
> with choice when trying to get things done is creating new friction points
> for users. As examples see https://twitter.com/chrisalbon/status/
> 833156959150841856 and https://twitter.com/sarah_guido/status/
> 833165030296322049.
> What are our foundations for REPL libraries in Scala?
>
> Toree was built on top of the Spark REPL and developers tried to use as
> much code as possible from Spark. For Alex’s jupyter-scala, he recognized
> that the Spark REPL was changing a lot from version to version. At the same
> time, Ammonite <https://github.com/lihaoyi/Ammonite> was created to
> assist in Scala scripting. In order to make big data frameworks such as
> Spark, Flink, and Scio to work well in this environment, a fork called
> Ammonium <https://github.com/alexarchambault/ammonium> was created. There
> is some amount of trepidation in using a separate fork as part of the
> kernel community. We should make sure to unify with the originating
> Ammonite and contribute back as part of a larger scala community that can
> maintain these together.
> Action Items:
>
>    -
>
>    Renew focus on Scala within Toree, improve outward messaging about how
>    Toree provides a scala kernel
>    -
>
>    Unify Ammonite and Ammonium ([email protected])
>    -
>
>       To be used in jupyter-scala, potentially for spylon
>
> There is more than one implementation of the Jupyter protocol in the Java
> Stack.
>
> Toree has one, jupyter-scala does one, clojure kernels have their own.
> People would like to see a stable Jupyter library for the JVM. Some think
> it’s better to have one per language. Regardless of choice, we should have
> a well supported Jupyter library.
> Action Items:
>
>    -
>
>    Create an idiomatic Java Library for the Jupyter messaging protocol -
>    propose this as an incubation project within Jupyter
>
> Decouple Spark from Scala in kernels
>
> Decouple language specific parts from the computing framework to allow for
> using other computing frameworks. This is paramount for R and Python. When
> we inevitably want to connect to a GPU cluster, we want to be able to use
> the same foundations of a kernel. The reason that these end up being
> coupled is that Spark does “slightly weird things” for how it wants its
> classes compiled. It’s thought that there is some amount of specialization
> and that we can work around it. At the very least, we can bake it into the
> core and leave room for other frameworks to have solid built in support if
> necessary.
>
> An approach being worked on in Toree right now is lazy loading of spark.
> One concern that is different between jupyter-scala and Toree is that
> jupyter-scala can dynamically load spark versions whereas for Toree is
> bound to a version of Spark on deployment. For end users that have
> operators/admins, kernels can be configured per version of spark it will
> use (common for Python, R). Spark drives lots of interest in Scala kernel,
> many kernels conflate the two. This results in poor messaging and
> experiences for users getting started.
> Action Items:
>
>    -
>
>    Lazy load spark within Toree
>
> Focus efforts within kernel communities
>
> Larger in scope than just the Scala kernel, we need jupyter to acknowledge
> fully supported kernels. In contrast, the whole community in Zeppelin
> collaborates in one repository around their interpreters.
>
> “Fragmentation of kernels makes it harder for large enterprises to adopt
> them.”
>
> - Tristan Zajonc (Cloudera)
>
> Beyond the technical implementation of what is a supported kernel, we also
> need the messaging to end users to be simple and clear. There are several
> objectives we need to do to improve our messaging, organization, and
> technical underpinnings.
> Action Items
>
>    -
>
>    On the Jupyter site provide blurbs and links to kernels for R, Python,
>    and Scala
>    -
>
>    Create an organized effort around the Scala Kernel, possibly by
>    unifying in an organization while isolating projects in separate
>    repositories
>    -
>
>    Align a specification of what it takes to be acknowledged as a
>    supported kernel
>
> Visualization
>
> We would like to be able to push on the idea of mimetypes that output a
> hunk of JSON and are able to draw beautiful visualizations. Having these
> adopted in core Jupyter by default would go a long way towards providing
> simple just works visualization. The current landscape of visualization
> with the Scala kernels includes
>
>
>    -
>
>    Vegas <https://github.com/vegas-viz/Vegas>
>    -
>
>    Plotly Scala <https://github.com/alexarchambault/plotly-scala>
>    -
>
>    Brunel <https://github.com/Brunel-Visualization/Brunel>
>    -
>
>    Data Resource / Table Schema (see https://github.com/pandas-dev/
>    pandas/pull/14904)
>
>
> There is a bit of worry about standardization around the HTML outputs.
> Some libraries try to use frontend libraries that may not exist on the
> frontend or mismatch in version - jquery, requirejs, ipywidgets, jupyter,
> ipython. In some frontends, at times dictated by the operating environment,
> the HTML outputs must be in null origin iframes.
> Action Items
>
>    -
>
>    Continue involvement in Jupyter frontends to provide rich
>    visualization out of the box with less configuration and less friction
>
> Standardizing display and reprs for Scala
>
> Since it’s likely that we there will still be multiple kernels available
> for the JVM, not just within Scala, we want to standardize the way in which
> you inspect objects in the JVM. IPython provides a way for libraries to
> integrate with IPython automatically for users. We want library developers
> to be able to follow a common scheme and be well represented regardless of
> the kernel.
> Action Items:
>
>    - Create a specification for object representation for JVM languages
>    as part of the Jupyter project
>
>
> --
> Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%
> 40mail.gmail.com
> <https://groups.google.com/d/msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/CAHNn8BXKihGwDoX%2Bh6n2E2aDvKU1-HCzONAZf3FD9Y2pBPeGVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jupyter] Scala Kernel Discussion

Reply via email to