Thanks for taking the lead on this Kyle! On Fri, Mar 3, 2017 at 5:14 PM, Kyle Kelley <[email protected]> wrote: > On February 27, 2017 a group of us met to talk about Scala kernels and pave > a path forward for Scala users. There is a youtube video available of the > discussion available here: > > > https://www.youtube.com/watch?v=0NRONVuct0E > > > What follows is a summary from the call, mostly in linear order from the > video itself. > > Attendees > > Alexander Archambault - Jupyter Scala, Ammonium > > Ryan Blue (Netflix) - Toree > > Gino Bustelo (IBM) - Toree > > Joy Chakraborty (Bloomberg) - Spark Magic with Livy > > Kyle Kelley (Netflix) - Jupyter > > Haley Most (Cloudera) - Toree > > Marius van Niekerk (Maxpoint) - Toree, Spylon > > Peter Parente (Maxpoint) - Jupyter > > Corey Stubbs (IBM) - Toree > > Jamie Whitacre (Berkeley) - Jupyter > > Tristan Zajonc (Cloudera) - Toree, Livy > > > Each of the people on the call has a preferred kernel, way of building it, > and integrating it. We have a significant user experience problem in terms > of users installing and using Scala kernels, beyond just Spark usage. The > overarching goal is to create a cohesive experience for Scala users when > they use Jupyter. > > > When a Scala user tries to come to the Jupyter ecosystem (or even a familiar > Python developer), they face many options for kernels. Being faced with > choice when trying to get things done is creating new friction points for > users. As examples see > https://twitter.com/chrisalbon/status/833156959150841856 and > https://twitter.com/sarah_guido/status/833165030296322049. > > What are our foundations for REPL libraries in Scala? > > > Toree was built on top of the Spark REPL and developers tried to use as much > code as possible from Spark. For Alex’s jupyter-scala, he recognized that > the Spark REPL was changing a lot from version to version. At the same time, > Ammonite was created to assist in Scala scripting. In order to make big data > frameworks such as Spark, Flink, and Scio to work well in this environment, > a fork called Ammonium was created. There is some amount of trepidation in > using a separate fork as part of the kernel community. We should make sure > to unify with the originating Ammonite and contribute back as part of a > larger scala community that can maintain these together. > > Action Items: > > Renew focus on Scala within Toree, improve outward messaging about how Toree > provides a scala kernel > > Unify Ammonite and Ammonium ([email protected]) > > To be used in jupyter-scala, potentially for spylon > > There is more than one implementation of the Jupyter protocol in the Java > Stack. > > > Toree has one, jupyter-scala does one, clojure kernels have their own. > People would like to see a stable Jupyter library for the JVM. Some think > it’s better to have one per language. Regardless of choice, we should have a > well supported Jupyter library. > > Action Items: > > > Create an idiomatic Java Library for the Jupyter messaging protocol - > propose this as an incubation project within Jupyter > > Decouple Spark from Scala in kernels > > > Decouple language specific parts from the computing framework to allow for > using other computing frameworks. This is paramount for R and Python. When > we inevitably want to connect to a GPU cluster, we want to be able to use > the same foundations of a kernel. The reason that these end up being coupled > is that Spark does “slightly weird things” for how it wants its classes > compiled. It’s thought that there is some amount of specialization and that > we can work around it. At the very least, we can bake it into the core and > leave room for other frameworks to have solid built in support if necessary. > > > An approach being worked on in Toree right now is lazy loading of spark. One > concern that is different between jupyter-scala and Toree is that > jupyter-scala can dynamically load spark versions whereas for Toree is bound > to a version of Spark on deployment. For end users that have > operators/admins, kernels can be configured per version of spark it will use > (common for Python, R). Spark drives lots of interest in Scala kernel, many > kernels conflate the two. This results in poor messaging and experiences for > users getting started. > > Action Items: > > > Lazy load spark within Toree > > Focus efforts within kernel communities > > > Larger in scope than just the Scala kernel, we need jupyter to acknowledge > fully supported kernels. In contrast, the whole community in Zeppelin > collaborates in one repository around their interpreters. > > > “Fragmentation of kernels makes it harder for large enterprises to adopt > them.” > > - Tristan Zajonc (Cloudera) > > > Beyond the technical implementation of what is a supported kernel, we also > need the messaging to end users to be simple and clear. There are several > objectives we need to do to improve our messaging, organization, and > technical underpinnings. > > Action Items > > > On the Jupyter site provide blurbs and links to kernels for R, Python, and > Scala > > Create an organized effort around the Scala Kernel, possibly by unifying in > an organization while isolating projects in separate repositories > > Align a specification of what it takes to be acknowledged as a supported > kernel > > Visualization > > We would like to be able to push on the idea of mimetypes that output a hunk > of JSON and are able to draw beautiful visualizations. Having these adopted > in core Jupyter by default would go a long way towards providing simple just > works visualization. The current landscape of visualization with the Scala > kernels includes > > > Vegas > > Plotly Scala > > Brunel > > Data Resource / Table Schema (see > https://github.com/pandas-dev/pandas/pull/14904) > > > There is a bit of worry about standardization around the HTML outputs. Some > libraries try to use frontend libraries that may not exist on the frontend > or mismatch in version - jquery, requirejs, ipywidgets, jupyter, ipython. In > some frontends, at times dictated by the operating environment, the HTML > outputs must be in null origin iframes. > > Action Items > > Continue involvement in Jupyter frontends to provide rich visualization out > of the box with less configuration and less friction > > Standardizing display and reprs for Scala > > > Since it’s likely that we there will still be multiple kernels available for > the JVM, not just within Scala, we want to standardize the way in which you > inspect objects in the JVM. IPython provides a way for libraries to > integrate with IPython automatically for users. We want library developers > to be able to follow a common scheme and be well represented regardless of > the kernel. > > Action Items: > > Create a specification for object representation for JVM languages as part > of the Jupyter project > > > -- > Kyle Kelley (@rgbkrk; lambdaops.com) > > -- > You received this message because you are subscribed to the Google Groups > "Project Jupyter" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/jupyter/CA%2BtbMaUQzt4tb9HVtEKaxrpmGib%3DbENhoYk%3D910vc01oid%3DNhA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout.
-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub [email protected] and [email protected] -- You received this message because you are subscribed to the Google Groups "Project Jupyter" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAH4pYpT5Pfz6ck1zgdTW%2BfZ%3DUA72VgjQKz7jNqowYVz-8ri%2BYg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
