Re: [jupyter] Re: Scala Kernel Discussion

Scott Draves Fri, 24 Mar 2017 17:14:58 -0700

Thanks!  We have kernels for Clojure, SQL, C++, and R also derived from 
this base kernel (I know the last two sound weird, but they use JNI and 
Rserve to good effect).


I understand more kernels isn't exactly what you are looking for :) But 
they were all pretty much working already, so we might as well get them out 
there before comparing their features and properties with the rest, and 
working towards standard kernels.

Next for me on this front is definitely to learn more about the all the 
options, I am a newcomer and it will take some time.  I hope you will be 
forgiving in the meantime.

Best, -Scott


On Friday, March 24, 2017 at 4:24:58 PM UTC-4, rgbkrk wrote:
>
> Scott,
>
> I really like the base Java package for building kernels on the JVM, since 
> it isn't tied to Scala or Groovy -- people can build on top of it. I'm 
> especially happy you licensed it all as Apache 2.
>
> All,
>
> What do people think is next for integrating amongst the various projects 
> and kernels?
>
>
>
> On Wed, Mar 22, 2017 at 8:28 AM, Scott Draves <[email protected] 
> <javascript:>> wrote:
>
>> I am really happy to hear that Spark support is getting serious attention.
>>
>> We have been working for some time in this area, and have some code to 
>> share: https://github.com/twosigma/beaker-notebook-private (despite the 
>> name this repository is open).
>>
>> This includes a Scala kernel (that is implemented on base JVM kernel and 
>> includes other languages such as Java and Groovy, and more coming).  The 
>> base kernel (implemented in java) has classes for comm and widgets.  In its 
>> previous incarnation in Beaker Notebook, we have a nice UI for integration 
>> with spark: https://github.com/twosigma/beaker-notebook/issues/4943, and 
>> we are looking for equivalent functionality in Jupyter.
>>
>> There is a lot more to say about this, and questions to ask, but just 
>> want to join the conversation sooner rather than later since it is moving 
>> so fast.
>>
>> Best, -Scott
>>
>>
>> On Friday, March 3, 2017 at 8:14:52 PM UTC-5, rgbkrk wrote:
>>>
>>> On February 27, 2017 a group of us met to talk about Scala kernels and 
>>> pave a path forward for Scala users. There is a youtube video available of 
>>> the discussion available here:
>>>
>>> https://www.youtube.com/watch?v=0NRONVuct0E
>>>
>>> What follows is a summary from the call, mostly in linear order from the 
>>> video itself.
>>> Attendees
>>>    
>>>    - 
>>>    
>>>    Alexander Archambault - Jupyter Scala, Ammonium
>>>    - 
>>>    
>>>    Ryan Blue (Netflix) - Toree
>>>    - 
>>>    
>>>    Gino Bustelo (IBM) - Toree
>>>    - 
>>>    
>>>    Joy Chakraborty (Bloomberg) - Spark Magic with Livy
>>>    - 
>>>    
>>>    Kyle Kelley (Netflix) - Jupyter
>>>    - 
>>>    
>>>    Haley Most (Cloudera) - Toree
>>>    - 
>>>    
>>>    Marius van Niekerk (Maxpoint) - Toree, Spylon
>>>    - 
>>>    
>>>    Peter Parente (Maxpoint) - Jupyter
>>>    - 
>>>    
>>>    Corey Stubbs (IBM) - Toree
>>>    - 
>>>    
>>>    Jamie Whitacre (Berkeley) - Jupyter
>>>    - 
>>>    
>>>    Tristan Zajonc (Cloudera) - Toree, Livy
>>>    
>>>
>>> Each of the people on the call has a preferred kernel, way of building 
>>> it, and integrating it. We have a significant user experience problem in 
>>> terms of users installing and using Scala kernels, beyond just Spark usage. 
>>> The overarching goal is to create a cohesive experience for Scala users 
>>> when they use Jupyter.
>>>
>>> When a Scala user tries to come to the Jupyter ecosystem (or even a 
>>> familiar Python developer), they face many options for kernels. Being faced 
>>> with choice when trying to get things done is creating new friction points 
>>> for users. As examples see 
>>> https://twitter.com/chrisalbon/status/833156959150841856 and 
>>> https://twitter.com/sarah_guido/status/833165030296322049.
>>> What are our foundations for REPL libraries in Scala?
>>>
>>> Toree was built on top of the Spark REPL and developers tried to use as 
>>> much code as possible from Spark. For Alex’s jupyter-scala, he recognized 
>>> that the Spark REPL was changing a lot from version to version. At the same 
>>> time, Ammonite <https://github.com/lihaoyi/Ammonite> was created to 
>>> assist in Scala scripting. In order to make big data frameworks such as 
>>> Spark, Flink, and Scio to work well in this environment, a fork called 
>>> Ammonium <https://github.com/alexarchambault/ammonium> was created. 
>>> There is some amount of trepidation in using a separate fork as part of the 
>>> kernel community. We should make sure to unify with the originating 
>>> Ammonite and contribute back as part of a larger scala community that can 
>>> maintain these together.
>>> Action Items:
>>>    
>>>    - 
>>>    
>>>    Renew focus on Scala within Toree, improve outward messaging about 
>>>    how Toree provides a scala kernel
>>>    - 
>>>    
>>>    Unify Ammonite and Ammonium ([email protected])
>>>    - 
>>>       
>>>       To be used in jupyter-scala, potentially for spylon
>>>       
>>> There is more than one implementation of the Jupyter protocol in the 
>>> Java Stack.
>>>
>>> Toree has one, jupyter-scala does one, clojure kernels have their own. 
>>> People would like to see a stable Jupyter library for the JVM. Some think 
>>> it’s better to have one per language. Regardless of choice, we should have 
>>> a well supported Jupyter library.
>>> Action Items:
>>>
>>>    - 
>>>    
>>>    Create an idiomatic Java Library for the Jupyter messaging protocol 
>>>    - propose this as an incubation project within Jupyter
>>>    
>>> Decouple Spark from Scala in kernels
>>>
>>> Decouple language specific parts from the computing framework to allow 
>>> for using other computing frameworks. This is paramount for R and Python. 
>>> When we inevitably want to connect to a GPU cluster, we want to be able to 
>>> use the same foundations of a kernel. The reason that these end up being 
>>> coupled is that Spark does “slightly weird things” for how it wants its 
>>> classes compiled. It’s thought that there is some amount of specialization 
>>> and that we can work around it. At the very least, we can bake it into the 
>>> core and leave room for other frameworks to have solid built in support if 
>>> necessary.
>>>
>>> An approach being worked on in Toree right now is lazy loading of spark. 
>>> One concern that is different between jupyter-scala and Toree is that 
>>> jupyter-scala can dynamically load spark versions whereas for Toree is 
>>> bound to a version of Spark on deployment. For end users that have 
>>> operators/admins, kernels can be configured per version of spark it will 
>>> use (common for Python, R). Spark drives lots of interest in Scala kernel, 
>>> many kernels conflate the two. This results in poor messaging and 
>>> experiences for users getting started.
>>> Action Items:
>>>
>>>    - 
>>>    
>>>    Lazy load spark within Toree
>>>    
>>> Focus efforts within kernel communities
>>>
>>> Larger in scope than just the Scala kernel, we need jupyter to 
>>> acknowledge fully supported kernels. In contrast, the whole community in 
>>> Zeppelin collaborates in one repository around their interpreters.
>>>
>>> “Fragmentation of kernels makes it harder for large enterprises to adopt 
>>> them.”
>>>
>>> - Tristan Zajonc (Cloudera)
>>>
>>> Beyond the technical implementation of what is a supported kernel, we 
>>> also need the messaging to end users to be simple and clear. There are 
>>> several objectives we need to do to improve our messaging, organization, 
>>> and technical underpinnings.
>>> Action Items
>>>
>>>    - 
>>>    
>>>    On the Jupyter site provide blurbs and links to kernels for R, 
>>>    Python, and Scala
>>>    - 
>>>    
>>>    Create an organized effort around the Scala Kernel, possibly by 
>>>    unifying in an organization while isolating projects in separate 
>>>    repositories
>>>    - 
>>>    
>>>    Align a specification of what it takes to be acknowledged as a 
>>>    supported kernel
>>>    
>>> Visualization
>>>
>>> We would like to be able to push on the idea of mimetypes that output a 
>>> hunk of JSON and are able to draw beautiful visualizations. Having these 
>>> adopted in core Jupyter by default would go a long way towards providing 
>>> simple just works visualization. The current landscape of visualization 
>>> with the Scala kernels includes
>>>
>>>
>>>    - 
>>>    
>>>    Vegas <https://github.com/vegas-viz/Vegas>
>>>    - 
>>>    
>>>    Plotly Scala <https://github.com/alexarchambault/plotly-scala>
>>>    - 
>>>    
>>>    Brunel <https://github.com/Brunel-Visualization/Brunel>
>>>    - 
>>>    
>>>    Data Resource / Table Schema (see 
>>>    https://github.com/pandas-dev/pandas/pull/14904)
>>>    
>>>
>>> There is a bit of worry about standardization around the HTML outputs. 
>>> Some libraries try to use frontend libraries that may not exist on the 
>>> frontend or mismatch in version - jquery, requirejs, ipywidgets, jupyter, 
>>> ipython. In some frontends, at times dictated by the operating environment, 
>>> the HTML outputs must be in null origin iframes.
>>> Action Items
>>>    
>>>    - 
>>>    
>>>    Continue involvement in Jupyter frontends to provide rich 
>>>    visualization out of the box with less configuration and less friction
>>>    
>>> Standardizing display and reprs for Scala
>>>
>>> Since it’s likely that we there will still be multiple kernels available 
>>> for the JVM, not just within Scala, we want to standardize the way in which 
>>> you inspect objects in the JVM. IPython provides a way for libraries to 
>>> integrate with IPython automatically for users. We want library developers 
>>> to be able to follow a common scheme and be well represented regardless of 
>>> the kernel.
>>> Action Items:
>>>    
>>>    - Create a specification for object representation for JVM languages 
>>>    as part of the Jupyter project
>>>
>>>
>>> -- 
>>> Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com)
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Project Jupyter" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/jupyter/c25d13fe-3049-4356-a14b-d16fa3fefcfc%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/jupyter/c25d13fe-3049-4356-a14b-d16fa3fefcfc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Kyle Kelley (@rgbkrk <https://twitter.com/rgbkrk>; lambdaops.com)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/2c975e7a-b3e3-4213-8026-91b88dcc1f21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jupyter] Re: Scala Kernel Discussion

Reply via email to