I have made some progress with a tool to handle the points discussed in
this thread. It's currently a command line tool and given a Zeppelin
notebook (note.json) it generates a Spark scala application, compiles it
using the compiler embedded in the scala sdk and then package all these
resources into a jar that works with spark-submit command.

I would like to start prototyping the integration into the Zeppelin UI and
I was wondering if it would be ok to use the above jar as a dependency
(e.g. from a maven release) and integrate into zeppelin...

Thoughts ?


On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> To Moon's point, This is what my vision is around this feature -
>
> 1. Use should be able to package 1, more than one, all of the paragraphs in
> a Notebook to create a Jar file which can be used with Spark-Submit.
>
> 2. The tool should automatically remove the all the interactive statements
> like print, show etc.
>
> 3. The tool should automatically create a Main class in addition to the jar
> file(s) which will internally call the respective jar. User can then change
> this main class if needed for parameterization through Args.
>
> Regards,
> Sourav
>
> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
> > I am also pretty much for this.
> >
> > I have got the similar request from each and every people/group who I
> > showcased Zeppelin.Regards,
> > Sourav
> >
> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <m...@apache.org> wrote:
> >
> >> Hi Luciano,
> >>
> >> I've also got a lot of questions about "Productize the notebook" every
> >> time
> >> i meet users use Zeppelin in their work.
> >>
> >> I think it's actually about two different problems that Zeppelin need to
> >> address.
> >>
> >> *1) Provide way that interactive notebook becomes part of production
> data
> >> pipeline.*
> >>
> >> Although Zeppelin does have quite convenient cron-like scheduler for
> each
> >> Note, built-in cron scheduler is not ready for serious use in the
> >> production. Because it lacks some features like actions after
> >> success/fail,
> >> fault-tolerance, history, and so on. I think community is working on
> >> improving it, and it's going to take some time.
> >>  Meanwhile, any external enterprise level job scheduler can run Note or
> >> Paragraph via REST api. But we don't have any guide and examples for it,
> >> what are the REST APIs user can use for this purpose, and how to use
> them
> >> in various cases (e.g. with authentication on, dynamic form parameters,
> >> etc). I think a lot of things need to be improved to make zeppelin
> easier
> >> to be part of production pipeline.
> >>
> >> *2) Provide stable way of run spark paragraphs.*
> >>
> >> Another barrier of using notebook in production pipeline is Scala REPL
> in
> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
> interactive
> >> scala session and Scala REPL will eventually hit OOME as it compiles and
> >> runs statements. Current workaround in zeppelin is cron-scheduler inside
> >> of
> >> notebook has checkbox that can restart the Note after scheduler runs it.
> >> Of course that option does not apply when external scheduler runs job
> >> through REST api.
> >>
> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
> >> application" is interesting. If Spark Paragraphs can be easily packaged
> >> into jar (spark application) that can be one of way to address 1) and
> 2).
> >> In case of user already have stable way to schedule spark application
> jar.
> >>
> >> Actually, Flink interactive shell works in similar way internally as far
> >> as
> >> i know. i.e. package compiled class into jar and submit.
> >>
> >> One idea for prototyping is,
> >> How about make a interpreter inside of spark interpreter group, say it's
> >> %spark.build or some better name.
> >>
> >> And if user runs some command like
> >>
> >> %spark.build
> >> package
> >>
> >> then it builds spark application jar based on spark paragraph in the
> Note.
> >> I think it can be the simplest user interface for the prototype.
> >>
> >> Thanks,
> >> moon
> >>
> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
> >> jer...@objectadjective.com>
> >> wrote:
> >>
> >> > Luciano, I think this would be a terrific feature. I've heard the
> exact
> >> > same workflow you've describe in all of the research we've done.
> >> >
> >> > ...........................
> >> >
> >> > Jeremy Anderson
> >> > Founder, Object Adjective
> >> > 415.493.8489
> >> > jer...@objectadjective.com
> >> > objectadjective.com <http://about.me/jeremyanderson>
> >> >
> >> >
> >> >
> >> > This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom they
> are
> >> > addressed.
> >> >
> >> > On 16 September 2016 at 12:19, Luciano Resende <luckbr1...@gmail.com>
> >> > wrote:
> >> >
> >> > > While talking with a few different users, I have been seeing the use
> >> case
> >> > > of using iterative development in Notebooks or Spark Shell and then
> >> > copying
> >> > > and pasting the final solution to a formal application repeating
> >> itself
> >> > > very often.
> >> > >
> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
> Application
> >> > > (jar)" would be a feature that Zeppelin community would think it's
> >> > useful.
> >> > > But keep in mind there are some limitation here : we would be
> >> constrained
> >> > > to Spark related paragraphs, etc...  but even so, I think there are
> >> > > multiple scenarios where I see that the ability to have an
> application
> >> > that
> >> > > directly runs on Spark to be very useful.
> >> > >
> >> > > If the community is interested, let's use this thread to discuss any
> >> > > specific requirements or suggestions that others might have, and
> >> after a
> >> > > few days I would like to start prototyping this functionality.
> >> > >
> >> > > Thoughts ?
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Luciano Resende
> >> > > http://twitter.com/lresende1975
> >> > > http://lresende.blogspot.com/
> >> > >
> >> >
> >>
> >
> >
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Reply via email to