I have made some progress with a tool to handle the points discussed in this thread. It's currently a command line tool and given a Zeppelin notebook (note.json) it generates a Spark scala application, compiles it using the compiler embedded in the scala sdk and then package all these resources into a jar that works with spark-submit command.
I would like to start prototyping the integration into the Zeppelin UI and I was wondering if it would be ok to use the above jar as a dependency (e.g. from a maven release) and integrate into zeppelin... Thoughts ? On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > To Moon's point, This is what my vision is around this feature - > > 1. Use should be able to package 1, more than one, all of the paragraphs in > a Notebook to create a Jar file which can be used with Spark-Submit. > > 2. The tool should automatically remove the all the interactive statements > like print, show etc. > > 3. The tool should automatically create a Main class in addition to the jar > file(s) which will internally call the respective jar. User can then change > this main class if needed for parameterization through Args. > > Regards, > Sourav > > On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > > > I am also pretty much for this. > > > > I have got the similar request from each and every people/group who I > > showcased Zeppelin.Regards, > > Sourav > > > > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <m...@apache.org> wrote: > > > >> Hi Luciano, > >> > >> I've also got a lot of questions about "Productize the notebook" every > >> time > >> i meet users use Zeppelin in their work. > >> > >> I think it's actually about two different problems that Zeppelin need to > >> address. > >> > >> *1) Provide way that interactive notebook becomes part of production > data > >> pipeline.* > >> > >> Although Zeppelin does have quite convenient cron-like scheduler for > each > >> Note, built-in cron scheduler is not ready for serious use in the > >> production. Because it lacks some features like actions after > >> success/fail, > >> fault-tolerance, history, and so on. I think community is working on > >> improving it, and it's going to take some time. > >> Meanwhile, any external enterprise level job scheduler can run Note or > >> Paragraph via REST api. But we don't have any guide and examples for it, > >> what are the REST APIs user can use for this purpose, and how to use > them > >> in various cases (e.g. with authentication on, dynamic form parameters, > >> etc). I think a lot of things need to be improved to make zeppelin > easier > >> to be part of production pipeline. > >> > >> *2) Provide stable way of run spark paragraphs.* > >> > >> Another barrier of using notebook in production pipeline is Scala REPL > in > >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide > interactive > >> scala session and Scala REPL will eventually hit OOME as it compiles and > >> runs statements. Current workaround in zeppelin is cron-scheduler inside > >> of > >> notebook has checkbox that can restart the Note after scheduler runs it. > >> Of course that option does not apply when external scheduler runs job > >> through REST api. > >> > >> I think what Luciano suggesting, "Export Spark Paragraph as Spark > >> application" is interesting. If Spark Paragraphs can be easily packaged > >> into jar (spark application) that can be one of way to address 1) and > 2). > >> In case of user already have stable way to schedule spark application > jar. > >> > >> Actually, Flink interactive shell works in similar way internally as far > >> as > >> i know. i.e. package compiled class into jar and submit. > >> > >> One idea for prototyping is, > >> How about make a interpreter inside of spark interpreter group, say it's > >> %spark.build or some better name. > >> > >> And if user runs some command like > >> > >> %spark.build > >> package > >> > >> then it builds spark application jar based on spark paragraph in the > Note. > >> I think it can be the simplest user interface for the prototype. > >> > >> Thanks, > >> moon > >> > >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson < > >> jer...@objectadjective.com> > >> wrote: > >> > >> > Luciano, I think this would be a terrific feature. I've heard the > exact > >> > same workflow you've describe in all of the research we've done. > >> > > >> > ........................... > >> > > >> > Jeremy Anderson > >> > Founder, Object Adjective > >> > 415.493.8489 > >> > jer...@objectadjective.com > >> > objectadjective.com <http://about.me/jeremyanderson> > >> > > >> > > >> > > >> > This email and any files transmitted with it are confidential and > >> > intended solely for the use of the individual or entity to whom they > are > >> > addressed. > >> > > >> > On 16 September 2016 at 12:19, Luciano Resende <luckbr1...@gmail.com> > >> > wrote: > >> > > >> > > While talking with a few different users, I have been seeing the use > >> case > >> > > of using iterative development in Notebooks or Spark Shell and then > >> > copying > >> > > and pasting the final solution to a formal application repeating > >> itself > >> > > very often. > >> > > > >> > > I was wondering if an "Export Spark Paragraphs as a Spark > Application > >> > > (jar)" would be a feature that Zeppelin community would think it's > >> > useful. > >> > > But keep in mind there are some limitation here : we would be > >> constrained > >> > > to Spark related paragraphs, etc... but even so, I think there are > >> > > multiple scenarios where I see that the ability to have an > application > >> > that > >> > > directly runs on Spark to be very useful. > >> > > > >> > > If the community is interested, let's use this thread to discuss any > >> > > specific requirements or suggestions that others might have, and > >> after a > >> > > few days I would like to start prototyping this functionality. > >> > > > >> > > Thoughts ? > >> > > > >> > > > >> > > > >> > > -- > >> > > Luciano Resende > >> > > http://twitter.com/lresende1975 > >> > > http://lresende.blogspot.com/ > >> > > > >> > > >> > > > > > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/