This is awesome, thanks Kyle (and everyone)! On Fri, Mar 3, 2017 at 5:14 PM, Kyle Kelley <[email protected]> wrote:
> On February 27, 2017 a group of us met to talk about Scala kernels and > pave a path forward for Scala users. There is a youtube video available of > the discussion available here: > > https://www.youtube.com/watch?v=0NRONVuct0E > > What follows is a summary from the call, mostly in linear order from the > video itself. > Attendees > > - > > Alexander Archambault - Jupyter Scala, Ammonium > - > > Ryan Blue (Netflix) - Toree > - > > Gino Bustelo (IBM) - Toree > - > > Joy Chakraborty (Bloomberg) - Spark Magic with Livy > - > > Kyle Kelley (Netflix) - Jupyter > - > > Haley Most (Cloudera) - Toree > - > > Marius van Niekerk (Maxpoint) - Toree, Spylon > - > > Peter Parente (Maxpoint) - Jupyter > - > > Corey Stubbs (IBM) - Toree > - > > Jamie Whitacre (Berkeley) - Jupyter > - > > Tristan Zajonc (Cloudera) - Toree, Livy > > > Each of the people on the call has a preferred kernel, way of building it, > and integrating it. We have a significant user experience problem in terms > of users installing and using Scala kernels, beyond just Spark usage. The > overarching goal is to create a cohesive experience for Scala users when > they use Jupyter. > > When a Scala user tries to come to the Jupyter ecosystem (or even a > familiar Python developer), they face many options for kernels. Being faced > with choice when trying to get things done is creating new friction points > for users. As examples see https://twitter.com/chrisalbon/status/ > 833156959150841856 and https://twitter.com/sarah_guido/status/ > 833165030296322049. > What are our foundations for REPL libraries in Scala? > > Toree was built on top of the Spark REPL and developers tried to use as > much code as possible from Spark. For Alex’s jupyter-scala, he recognized > that the Spark REPL was changing a lot from version to version. At the same > time, Ammonite <https://github.com/lihaoyi/Ammonite> was created to > assist in Scala scripting. In order to make big data frameworks such as > Spark, Flink, and Scio to work well in this environment, a fork called > Ammonium <https://github.com/alexarchambault/ammonium> was created. There > is some amount of trepidation in using a separate fork as part of the > kernel community. We should make sure to unify with the originating > Ammonite and contribute back as part of a larger scala community that can > maintain these together. > Action Items: > > - > > Renew focus on Scala within Toree, improve outward messaging about how > Toree provides a scala kernel > - > > Unify Ammonite and Ammonium ([email protected]) > - > > To be used in jupyter-scala, potentially for spylon > > There is more than one implementation of the Jupyter protocol in the Java > Stack. > > Toree has one, jupyter-scala does one, clojure kernels have their own. > People would like to see a stable Jupyter library for the JVM. Some think > it’s better to have one per language. Regardless of choice, we should have > a well supported Jupyter library. > Action Items: > > - > > Create an idiomatic Java Library for the Jupyter messaging protocol - > propose this as an incubation project within Jupyter > > Decouple Spark from Scala in kernels > > Decouple language specific parts from the computing framework to allow for > using other computing frameworks. This is paramount for R and Python. When > we inevitably want to connect to a GPU cluster, we want to be able to use > the same foundations of a kernel. The reason that these end up being > coupled is that Spark does “slightly weird things” for how it wants its > classes compiled. It’s thought that there is some amount of specialization > and that we can work around it. At the very least, we can bake it into the > core and leave room for other frameworks to have solid built in support if > necessary. > > An approach being worked on in Toree right now is lazy loading of spark. > One concern that is different between jupyter-scala and Toree is that > jupyter-scala can dynamically load spark versions whereas for Toree is > bound to a version of Spark on deployment. For end users that have > operators/admins, kernels can be configured per version of spark it will > use (common for Python, R). Spark drives lots of interest in Scala kernel, > many kernels conflate the two. This results in poor messaging and > experiences for users getting started. > Action Items: > > - > > Lazy load spark within Toree > > Focus efforts within kernel communities > > Larger in scope than just the Scala kernel, we need jupyter to acknowledge > fully supported kernels. In contrast, the whole community in Zeppelin > collaborates in one repository around their interpreters. > > “Fragmentation of kernels makes it harder for large enterprises to adopt > them.” > > - Tristan Zajonc (Cloudera) > > Beyond the technical implementation of what is a supported kernel, we also > need the messaging to end users to be simple and clear. There are several > objectives we need to do to improve our messaging, organization, and > technical underpinnings. > Action Items > > - > > On the Jupyter site provide blurbs and links to kernels for R, Python, > and Scala > - > > Create an organized effort around the Scala Kernel, possibly by > unifying in an organization while isolating projects in separate > repositories > - > > Align a specification of what it takes to be acknowledged as a > supported kernel > > Visualization > > We would like to be able to push on the idea of mimetypes that output a > hunk of JSON and are able to draw beautiful visualizations. Having these > adopted in core Jupyter by default would go a long way towards providing > simple just works visualization. The current landscape of visualization > with the Scala kernels includes > > > - > > Vegas <https://github.com/vegas-viz/Vegas> > - > > Plotly Scala <https://github.com/alexarchambault/plotly-scala> > - > > Brunel <https://github.com/Brunel-Visualization/Brunel> > - > > Data Resource / Table Schema (see https://github.com/pandas-dev/ > pandas/pull/14904) > > > There is a bit of worry about standardization around the HTML outputs. > Some libraries try to use frontend libraries that may not exist on the > frontend or mismatch in version - jquery, requirejs, ipywidgets, jupyter, > ipython. In some frontends, at times dictated by the operating environment, > the HTML outputs must be in null origin iframes. > Action Items > > - > > Continue involvement in Jupyter frontends to provide rich > visualization out of the box with less configuration and less friction > > Standardizing display and reprs for Scala > > Since it’s likely that we there will still be multiple kernels available > for the JVM, not just within Scala, we want to standardize the way in which > you inspect objects in the JVM. IPython provides a way for libraries to > integrate with IPython automatically for users. We want library developers > to be able to follow a common scheme and be well represented regardless of > the kernel. > Action Items: > > - Create a specification for object representation for JVM languages > as part of the Jupyter project > > > -- > Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com) > > -- > You received this message because you are subscribed to the Google Groups > "Project Jupyter" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA% > 40mail.gmail.com > <https://groups.google.com/d/msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Project Jupyter" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAHNn8BXKihGwDoX%2Bh6n2E2aDvKU1-HCzONAZf3FD9Y2pBPeGVA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
