To Moon's point, This is what my vision is around this feature -

1. Use should be able to package 1, more than one, all of the paragraphs in
a Notebook to create a Jar file which can be used with Spark-Submit.

2. The tool should automatically remove the all the interactive statements
like print, show etc.

3. The tool should automatically create a Main class in addition to the jar
file(s) which will internally call the respective jar. User can then change
this main class if needed for parameterization through Args.

Regards,
Sourav

On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> I am also pretty much for this.
>
> I have got the similar request from each and every people/group who I
> showcased Zeppelin.Regards,
> Sourav
>
> On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <m...@apache.org> wrote:
>
>> Hi Luciano,
>>
>> I've also got a lot of questions about "Productize the notebook" every
>> time
>> i meet users use Zeppelin in their work.
>>
>> I think it's actually about two different problems that Zeppelin need to
>> address.
>>
>> *1) Provide way that interactive notebook becomes part of production data
>> pipeline.*
>>
>> Although Zeppelin does have quite convenient cron-like scheduler for each
>> Note, built-in cron scheduler is not ready for serious use in the
>> production. Because it lacks some features like actions after
>> success/fail,
>> fault-tolerance, history, and so on. I think community is working on
>> improving it, and it's going to take some time.
>>  Meanwhile, any external enterprise level job scheduler can run Note or
>> Paragraph via REST api. But we don't have any guide and examples for it,
>> what are the REST APIs user can use for this purpose, and how to use them
>> in various cases (e.g. with authentication on, dynamic form parameters,
>> etc). I think a lot of things need to be improved to make zeppelin easier
>> to be part of production pipeline.
>>
>> *2) Provide stable way of run spark paragraphs.*
>>
>> Another barrier of using notebook in production pipeline is Scala REPL in
>> SparkInterpreter. SparkInterpreter uses Scala REPL to provide interactive
>> scala session and Scala REPL will eventually hit OOME as it compiles and
>> runs statements. Current workaround in zeppelin is cron-scheduler inside
>> of
>> notebook has checkbox that can restart the Note after scheduler runs it.
>> Of course that option does not apply when external scheduler runs job
>> through REST api.
>>
>> I think what Luciano suggesting, "Export Spark Paragraph as Spark
>> application" is interesting. If Spark Paragraphs can be easily packaged
>> into jar (spark application) that can be one of way to address 1) and 2).
>> In case of user already have stable way to schedule spark application jar.
>>
>> Actually, Flink interactive shell works in similar way internally as far
>> as
>> i know. i.e. package compiled class into jar and submit.
>>
>> One idea for prototyping is,
>> How about make a interpreter inside of spark interpreter group, say it's
>> %spark.build or some better name.
>>
>> And if user runs some command like
>>
>> %spark.build
>> package
>>
>> then it builds spark application jar based on spark paragraph in the Note.
>> I think it can be the simplest user interface for the prototype.
>>
>> Thanks,
>> moon
>>
>> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
>> jer...@objectadjective.com>
>> wrote:
>>
>> > Luciano, I think this would be a terrific feature. I've heard the exact
>> > same workflow you've describe in all of the research we've done.
>> >
>> > ...........................
>> >
>> > Jeremy Anderson
>> > Founder, Object Adjective
>> > 415.493.8489
>> > jer...@objectadjective.com
>> > objectadjective.com <http://about.me/jeremyanderson>
>> >
>> >
>> >
>> > This email and any files transmitted with it are confidential and
>> > intended solely for the use of the individual or entity to whom they are
>> > addressed.
>> >
>> > On 16 September 2016 at 12:19, Luciano Resende <luckbr1...@gmail.com>
>> > wrote:
>> >
>> > > While talking with a few different users, I have been seeing the use
>> case
>> > > of using iterative development in Notebooks or Spark Shell and then
>> > copying
>> > > and pasting the final solution to a formal application repeating
>> itself
>> > > very often.
>> > >
>> > > I was wondering if an "Export Spark Paragraphs as a Spark Application
>> > > (jar)" would be a feature that Zeppelin community would think it's
>> > useful.
>> > > But keep in mind there are some limitation here : we would be
>> constrained
>> > > to Spark related paragraphs, etc...  but even so, I think there are
>> > > multiple scenarios where I see that the ability to have an application
>> > that
>> > > directly runs on Spark to be very useful.
>> > >
>> > > If the community is interested, let's use this thread to discuss any
>> > > specific requirements or suggestions that others might have, and
>> after a
>> > > few days I would like to start prototyping this functionality.
>> > >
>> > > Thoughts ?
>> > >
>> > >
>> > >
>> > > --
>> > > Luciano Resende
>> > > http://twitter.com/lresende1975
>> > > http://lresende.blogspot.com/
>> > >
>> >
>>
>
>

Reply via email to