I don¹t understand why user want to export zeppelin note as spark
application. 

If they want to trigger the running of spark app, why not use zeppelin¹s
rest api for that. Even user export it as spark application, most of time
in reality, they need to submit it through spark job server, so why not
use zeppelin as a spark job server.
And if the spark app fails, it is pretty hard to debug it, because the
exporting tool has changed/restructured the source code.
 

If this is a pretty large and complicated spark application, I don¹t think
zeppelin is a proper tool for that, they¹d better to use IDE for that
project.

BTW, After https://github.com/apache/zeppelin/pull/1799, user can define
the dependency between paragraphs, and they can run one whole note which
contains different interpreters.
 


Best Regard,
Jeff Zhang





On 1/5/17, 2:25 AM, "Luciano Resende" <luckbr1...@gmail.com> wrote:

>I have made some progress with a tool to handle the points discussed in
>this thread. It's currently a command line tool and given a Zeppelin
>notebook (note.json) it generates a Spark scala application, compiles it
>using the compiler embedded in the scala sdk and then package all these
>resources into a jar that works with spark-submit command.
>
>I would like to start prototyping the integration into the Zeppelin UI and
>I was wondering if it would be ok to use the above jar as a dependency
>(e.g. from a maven release) and integrate into zeppelin...
>
>Thoughts ?
>
>
>On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder <
>sourav.mazumde...@gmail.com> wrote:
>
>> To Moon's point, This is what my vision is around this feature -
>>
>> 1. Use should be able to package 1, more than one, all of the
>>paragraphs in
>> a Notebook to create a Jar file which can be used with Spark-Submit.
>>
>> 2. The tool should automatically remove the all the interactive
>>statements
>> like print, show etc.
>>
>> 3. The tool should automatically create a Main class in addition to the
>>jar
>> file(s) which will internally call the respective jar. User can then
>>change
>> this main class if needed for parameterization through Args.
>>
>> Regards,
>> Sourav
>>
>> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>> > I am also pretty much for this.
>> >
>> > I have got the similar request from each and every people/group who I
>> > showcased Zeppelin.Regards,
>> > Sourav
>> >
>> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <m...@apache.org> wrote:
>> >
>> >> Hi Luciano,
>> >>
>> >> I've also got a lot of questions about "Productize the notebook"
>>every
>> >> time
>> >> i meet users use Zeppelin in their work.
>> >>
>> >> I think it's actually about two different problems that Zeppelin
>>need to
>> >> address.
>> >>
>> >> *1) Provide way that interactive notebook becomes part of production
>> data
>> >> pipeline.*
>> >>
>> >> Although Zeppelin does have quite convenient cron-like scheduler for
>> each
>> >> Note, built-in cron scheduler is not ready for serious use in the
>> >> production. Because it lacks some features like actions after
>> >> success/fail,
>> >> fault-tolerance, history, and so on. I think community is working on
>> >> improving it, and it's going to take some time.
>> >>  Meanwhile, any external enterprise level job scheduler can run Note
>>or
>> >> Paragraph via REST api. But we don't have any guide and examples for
>>it,
>> >> what are the REST APIs user can use for this purpose, and how to use
>> them
>> >> in various cases (e.g. with authentication on, dynamic form
>>parameters,
>> >> etc). I think a lot of things need to be improved to make zeppelin
>> easier
>> >> to be part of production pipeline.
>> >>
>> >> *2) Provide stable way of run spark paragraphs.*
>> >>
>> >> Another barrier of using notebook in production pipeline is Scala
>>REPL
>> in
>> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide
>> interactive
>> >> scala session and Scala REPL will eventually hit OOME as it compiles
>>and
>> >> runs statements. Current workaround in zeppelin is cron-scheduler
>>inside
>> >> of
>> >> notebook has checkbox that can restart the Note after scheduler runs
>>it.
>> >> Of course that option does not apply when external scheduler runs job
>> >> through REST api.
>> >>
>> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark
>> >> application" is interesting. If Spark Paragraphs can be easily
>>packaged
>> >> into jar (spark application) that can be one of way to address 1) and
>> 2).
>> >> In case of user already have stable way to schedule spark application
>> jar.
>> >>
>> >> Actually, Flink interactive shell works in similar way internally as
>>far
>> >> as
>> >> i know. i.e. package compiled class into jar and submit.
>> >>
>> >> One idea for prototyping is,
>> >> How about make a interpreter inside of spark interpreter group, say
>>it's
>> >> %spark.build or some better name.
>> >>
>> >> And if user runs some command like
>> >>
>> >> %spark.build
>> >> package
>> >>
>> >> then it builds spark application jar based on spark paragraph in the
>> Note.
>> >> I think it can be the simplest user interface for the prototype.
>> >>
>> >> Thanks,
>> >> moon
>> >>
>> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson <
>> >> jer...@objectadjective.com>
>> >> wrote:
>> >>
>> >> > Luciano, I think this would be a terrific feature. I've heard the
>> exact
>> >> > same workflow you've describe in all of the research we've done.
>> >> >
>> >> > ...........................
>> >> >
>> >> > Jeremy Anderson
>> >> > Founder, Object Adjective
>> >> > 415.493.8489
>> >> > jer...@objectadjective.com
>> >> > objectadjective.com <http://about.me/jeremyanderson>
>> >> >
>> >> >
>> >> >
>> >> > This email and any files transmitted with it are confidential and
>> >> > intended solely for the use of the individual or entity to whom
>>they
>> are
>> >> > addressed.
>> >> >
>> >> > On 16 September 2016 at 12:19, Luciano Resende
>><luckbr1...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > While talking with a few different users, I have been seeing the
>>use
>> >> case
>> >> > > of using iterative development in Notebooks or Spark Shell and
>>then
>> >> > copying
>> >> > > and pasting the final solution to a formal application repeating
>> >> itself
>> >> > > very often.
>> >> > >
>> >> > > I was wondering if an "Export Spark Paragraphs as a Spark
>> Application
>> >> > > (jar)" would be a feature that Zeppelin community would think
>>it's
>> >> > useful.
>> >> > > But keep in mind there are some limitation here : we would be
>> >> constrained
>> >> > > to Spark related paragraphs, etc...  but even so, I think there
>>are
>> >> > > multiple scenarios where I see that the ability to have an
>> application
>> >> > that
>> >> > > directly runs on Spark to be very useful.
>> >> > >
>> >> > > If the community is interested, let's use this thread to discuss
>>any
>> >> > > specific requirements or suggestions that others might have, and
>> >> after a
>> >> > > few days I would like to start prototyping this functionality.
>> >> > >
>> >> > > Thoughts ?
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Luciano Resende
>> >> > > http://twitter.com/lresende1975
>> >> > > http://lresende.blogspot.com/
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>
>
>-- 
>Luciano Resende
>http://twitter.com/lresende1975
>http://lresende.blogspot.com/

Reply via email to