Re: [jupyter] Scala Kernel Discussion

Brian Granger Sat, 04 Mar 2017 08:15:37 -0800

Thanks for taking the lead on this Kyle!

On Fri, Mar 3, 2017 at 5:14 PM, Kyle Kelley <[email protected]> wrote:
> On February 27, 2017 a group of us met to talk about Scala kernels and pave
> a path forward for Scala users. There is a youtube video available of the
> discussion available here:
>
>
> https://www.youtube.com/watch?v=0NRONVuct0E
>
>
> What follows is a summary from the call, mostly in linear order from the
> video itself.
>
> Attendees
>
> Alexander Archambault - Jupyter Scala, Ammonium
>
> Ryan Blue (Netflix) - Toree
>
> Gino Bustelo (IBM) - Toree
>
> Joy Chakraborty (Bloomberg) - Spark Magic with Livy
>
> Kyle Kelley (Netflix) - Jupyter
>
> Haley Most (Cloudera) - Toree
>
> Marius van Niekerk (Maxpoint) - Toree, Spylon
>
> Peter Parente (Maxpoint) - Jupyter
>
> Corey Stubbs (IBM) - Toree
>
> Jamie Whitacre (Berkeley) - Jupyter
>
> Tristan Zajonc (Cloudera) - Toree, Livy
>
>
> Each of the people on the call has a preferred kernel, way of building it,
> and integrating it. We have a significant user experience problem in terms
> of users installing and using Scala kernels, beyond just Spark usage. The
> overarching goal is to create a cohesive experience for Scala users when
> they use Jupyter.
>
>
> When a Scala user tries to come to the Jupyter ecosystem (or even a familiar
> Python developer), they face many options for kernels. Being faced with
> choice when trying to get things done is creating new friction points for
> users. As examples see
> https://twitter.com/chrisalbon/status/833156959150841856 and
> https://twitter.com/sarah_guido/status/833165030296322049.
>
> What are our foundations for REPL libraries in Scala?
>
>
> Toree was built on top of the Spark REPL and developers tried to use as much
> code as possible from Spark. For Alex’s jupyter-scala, he recognized that
> the Spark REPL was changing a lot from version to version. At the same time,
> Ammonite was created to assist in Scala scripting. In order to make big data
> frameworks such as Spark, Flink, and Scio to work well in this environment,
> a fork called Ammonium was created. There is some amount of trepidation in
> using a separate fork as part of the kernel community. We should make sure
> to unify with the originating Ammonite and contribute back as part of a
> larger scala community that can maintain these together.
>
> Action Items:
>
> Renew focus on Scala within Toree, improve outward messaging about how Toree
> provides a scala kernel
>
> Unify Ammonite and Ammonium ([email protected])
>
> To be used in jupyter-scala, potentially for spylon
>
> There is more than one implementation of the Jupyter protocol in the Java
> Stack.
>
>
> Toree has one, jupyter-scala does one, clojure kernels have their own.
> People would like to see a stable Jupyter library for the JVM. Some think
> it’s better to have one per language. Regardless of choice, we should have a
> well supported Jupyter library.
>
> Action Items:
>
>
> Create an idiomatic Java Library for the Jupyter messaging protocol -
> propose this as an incubation project within Jupyter
>
> Decouple Spark from Scala in kernels
>
>
> Decouple language specific parts from the computing framework to allow for
> using other computing frameworks. This is paramount for R and Python. When
> we inevitably want to connect to a GPU cluster, we want to be able to use
> the same foundations of a kernel. The reason that these end up being coupled
> is that Spark does “slightly weird things” for how it wants its classes
> compiled. It’s thought that there is some amount of specialization and that
> we can work around it. At the very least, we can bake it into the core and
> leave room for other frameworks to have solid built in support if necessary.
>
>
> An approach being worked on in Toree right now is lazy loading of spark. One
> concern that is different between jupyter-scala and Toree is that
> jupyter-scala can dynamically load spark versions whereas for Toree is bound
> to a version of Spark on deployment. For end users that have
> operators/admins, kernels can be configured per version of spark it will use
> (common for Python, R). Spark drives lots of interest in Scala kernel, many
> kernels conflate the two. This results in poor messaging and experiences for
> users getting started.
>
> Action Items:
>
>
> Lazy load spark within Toree
>
> Focus efforts within kernel communities
>
>
> Larger in scope than just the Scala kernel, we need jupyter to acknowledge
> fully supported kernels. In contrast, the whole community in Zeppelin
> collaborates in one repository around their interpreters.
>
>
> “Fragmentation of kernels makes it harder for large enterprises to adopt
> them.”
>
> - Tristan Zajonc (Cloudera)
>
>
> Beyond the technical implementation of what is a supported kernel, we also
> need the messaging to end users to be simple and clear. There are several
> objectives we need to do to improve our messaging, organization, and
> technical underpinnings.
>
> Action Items
>
>
> On the Jupyter site provide blurbs and links to kernels for R, Python, and
> Scala
>
> Create an organized effort around the Scala Kernel, possibly by unifying in
> an organization while isolating projects in separate repositories
>
> Align a specification of what it takes to be acknowledged as a supported
> kernel
>
> Visualization
>
> We would like to be able to push on the idea of mimetypes that output a hunk
> of JSON and are able to draw beautiful visualizations. Having these adopted
> in core Jupyter by default would go a long way towards providing simple just
> works visualization. The current landscape of visualization with the Scala
> kernels includes
>
>
> Vegas
>
> Plotly Scala
>
> Brunel
>
> Data Resource / Table Schema (see
> https://github.com/pandas-dev/pandas/pull/14904)
>
>
> There is a bit of worry about standardization around the HTML outputs. Some
> libraries try to use frontend libraries that may not exist on the frontend
> or mismatch in version - jquery, requirejs, ipywidgets, jupyter, ipython. In
> some frontends, at times dictated by the operating environment, the HTML
> outputs must be in null origin iframes.
>
> Action Items
>
> Continue involvement in Jupyter frontends to provide rich visualization out
> of the box with less configuration and less friction
>
> Standardizing display and reprs for Scala
>
>
> Since it’s likely that we there will still be multiple kernels available for
> the JVM, not just within Scala, we want to standardize the way in which you
> inspect objects in the JVM. IPython provides a way for libraries to
> integrate with IPython automatically for users. We want library developers
> to be able to follow a common scheme and be well represented regardless of
> the kernel.
>
> Action Items:
>
> Create a specification for object representation for JVM languages as part
> of the Jupyter project
>
>
> --
> Kyle Kelley (@rgbkrk; lambdaops.com)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.




-- 
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/CAH4pYpT5Pfz6ck1zgdTW%2BfZ%3DUA72VgjQKz7jNqowYVz-8ri%2BYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jupyter] Scala Kernel Discussion

Reply via email to