Hi Luciano, maybe I am wrong, just my two cents for your consideration.
Jeff Zhang <zjf...@gmail.com>于2017年1月6日周五 上午8:32写道: > > Thanks Luciano. I am not saying the community don't feel this is a good > idea. It's just my personal opinion (maybe with some bias, I didn't talk > with many customers as you) I just feel maybe you can spend time on > improving zeppelin to make zeppelin to do the job rather than exporting the > jar and leverage other tools to deploy the jar. Because I don't want you > to waste time that maybe finally you find out customer are happy to do that > in one central place: zeppelin. Anyway, this is just my personal thinking, > you can talk with your customers to hear their feedback. > > > Luciano Resende <luckbr1...@gmail.com>于2017年1月6日周五 上午5:01写道: > > Hi Jeff, > > While I agree with you that what you mentioned is completely acceptable for > some users, particularly regarding the data science personas. Having said > that, while talking with multiple enterprise companies, that have their own > scheduler infrastructure with different quality of service or just want to > deploy this as an app into their production environment which will have > much more resources for running these apps with complete data sets, and > currently they finish the experiment/development of the application in an > interactive environment and them move their final code into a native spark > application. > > Zeppelin is evolving quickly in this area, and I think that export as an > application might be a good option for users that want to actually deploy > their notebooks as native applications into their own Spark cluster. > > Having said that, if the community feels that this is not a required > function in Zeppelin anymore, then I can continue with the development of > the tool as a standalone command line tool. I was even thinking about > expanding the functionality and implementing what is described in > ZEPPELIN-1793. > > Thoughts ? > > On Thu, Jan 5, 2017 at 12:38 AM, Jeff Zhang <zjf...@gmail.com> wrote: > > > Thanks Luciano. IIRC, what user want is to run the whole spark app, but > > they don't care about whether it is in zeppelin or through a standard > spark > > app jar. I know zeppelin currently doesn't do well in converting note to > > production spark app as Lee mentioned. But exporting note as jar seems a > > short term solution, not a long term solution. I just feel when zeppelin > > improve in this field, user might abandon this solution and transit to > > zeppelin again. Here's some disadvantages I can see of this approach. > > > > 1. If user want to change the code in iterative development , they have > to > > repeat the whole pipeline (change code in zeppelin -> export it to spark > > jar -> redeploy this jar). This process is painful and wasting time. > > 2. Hard to debug and diagnose as code is changed/restructured when > > exporting to jar > > 3. User have to leverage several distinct tools for the whole > development > > cycle (zeppelin, spark job server, and maybe cron job) > > > > Besides, the OOM issue of Spark REPL Lee mentioned might not be a > problem. > > Because we can shutdown the app (close interpreter) after the app is > done. > > > > > > > > > > > > Luciano Resende <luckbr1...@gmail.com>于2017年1月5日周四 下午3:59写道: > > > > Some use cases discussed earlier on this thread: > > > > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06323.html > > > > https://www.mail-archive.com/dev@zeppelin.apache.org/msg06332.html > > > > On Wed, Jan 4, 2017 at 4:51 PM, Jianfeng (Jeff) Zhang < > > jzh...@hortonworks.com> wrote: > > > > > > > > I don¹t understand why user want to export zeppelin note as spark > > > application. > > > > > > If they want to trigger the running of spark app, why not use > zeppelin¹s > > > rest api for that. Even user export it as spark application, most of > time > > > in reality, they need to submit it through spark job server, so why not > > > use zeppelin as a spark job server. > > > And if the spark app fails, it is pretty hard to debug it, because the > > > exporting tool has changed/restructured the source code. > > > > > > > > > If this is a pretty large and complicated spark application, I don¹t > > think > > > zeppelin is a proper tool for that, they¹d better to use IDE for that > > > project. > > > > > > BTW, After https://github.com/apache/zeppelin/pull/1799, user can > define > > > the dependency between paragraphs, and they can run one whole note > which > > > contains different interpreters. > > > > > > > > > > > > Best Regard, > > > Jeff Zhang > > > > > > > > > > > > > > > > > > On 1/5/17, 2:25 AM, "Luciano Resende" <luckbr1...@gmail.com> wrote: > > > > > > >I have made some progress with a tool to handle the points discussed > in > > > >this thread. It's currently a command line tool and given a Zeppelin > > > >notebook (note.json) it generates a Spark scala application, compiles > it > > > >using the compiler embedded in the scala sdk and then package all > these > > > >resources into a jar that works with spark-submit command. > > > > > > > >I would like to start prototyping the integration into the Zeppelin UI > > and > > > >I was wondering if it would be ok to use the above jar as a dependency > > > >(e.g. from a maven release) and integrate into zeppelin... > > > > > > > >Thoughts ? > > > > > > > > > > > >On Mon, Sep 19, 2016 at 7:47 AM, Sourav Mazumder < > > > >sourav.mazumde...@gmail.com> wrote: > > > > > > > >> To Moon's point, This is what my vision is around this feature - > > > >> > > > >> 1. Use should be able to package 1, more than one, all of the > > > >>paragraphs in > > > >> a Notebook to create a Jar file which can be used with Spark-Submit. > > > >> > > > >> 2. The tool should automatically remove the all the interactive > > > >>statements > > > >> like print, show etc. > > > >> > > > >> 3. The tool should automatically create a Main class in addition to > > the > > > >>jar > > > >> file(s) which will internally call the respective jar. User can then > > > >>change > > > >> this main class if needed for parameterization through Args. > > > >> > > > >> Regards, > > > >> Sourav > > > >> > > > >> On Mon, Sep 19, 2016 at 7:33 AM, Sourav Mazumder < > > > >> sourav.mazumde...@gmail.com> wrote: > > > >> > > > >> > I am also pretty much for this. > > > >> > > > > >> > I have got the similar request from each and every people/group > who > > I > > > >> > showcased Zeppelin.Regards, > > > >> > Sourav > > > >> > > > > >> > On Fri, Sep 16, 2016 at 8:06 PM, moon soo Lee <m...@apache.org> > > > wrote: > > > >> > > > > >> >> Hi Luciano, > > > >> >> > > > >> >> I've also got a lot of questions about "Productize the notebook" > > > >>every > > > >> >> time > > > >> >> i meet users use Zeppelin in their work. > > > >> >> > > > >> >> I think it's actually about two different problems that Zeppelin > > > >>need to > > > >> >> address. > > > >> >> > > > >> >> *1) Provide way that interactive notebook becomes part of > > production > > > >> data > > > >> >> pipeline.* > > > >> >> > > > >> >> Although Zeppelin does have quite convenient cron-like scheduler > > for > > > >> each > > > >> >> Note, built-in cron scheduler is not ready for serious use in the > > > >> >> production. Because it lacks some features like actions after > > > >> >> success/fail, > > > >> >> fault-tolerance, history, and so on. I think community is working > > on > > > >> >> improving it, and it's going to take some time. > > > >> >> Meanwhile, any external enterprise level job scheduler can run > > Note > > > >>or > > > >> >> Paragraph via REST api. But we don't have any guide and examples > > for > > > >>it, > > > >> >> what are the REST APIs user can use for this purpose, and how to > > use > > > >> them > > > >> >> in various cases (e.g. with authentication on, dynamic form > > > >>parameters, > > > >> >> etc). I think a lot of things need to be improved to make > zeppelin > > > >> easier > > > >> >> to be part of production pipeline. > > > >> >> > > > >> >> *2) Provide stable way of run spark paragraphs.* > > > >> >> > > > >> >> Another barrier of using notebook in production pipeline is Scala > > > >>REPL > > > >> in > > > >> >> SparkInterpreter. SparkInterpreter uses Scala REPL to provide > > > >> interactive > > > >> >> scala session and Scala REPL will eventually hit OOME as it > > compiles > > > >>and > > > >> >> runs statements. Current workaround in zeppelin is cron-scheduler > > > >>inside > > > >> >> of > > > >> >> notebook has checkbox that can restart the Note after scheduler > > runs > > > >>it. > > > >> >> Of course that option does not apply when external scheduler runs > > job > > > >> >> through REST api. > > > >> >> > > > >> >> I think what Luciano suggesting, "Export Spark Paragraph as Spark > > > >> >> application" is interesting. If Spark Paragraphs can be easily > > > >>packaged > > > >> >> into jar (spark application) that can be one of way to address 1) > > and > > > >> 2). > > > >> >> In case of user already have stable way to schedule spark > > application > > > >> jar. > > > >> >> > > > >> >> Actually, Flink interactive shell works in similar way internally > > as > > > >>far > > > >> >> as > > > >> >> i know. i.e. package compiled class into jar and submit. > > > >> >> > > > >> >> One idea for prototyping is, > > > >> >> How about make a interpreter inside of spark interpreter group, > say > > > >>it's > > > >> >> %spark.build or some better name. > > > >> >> > > > >> >> And if user runs some command like > > > >> >> > > > >> >> %spark.build > > > >> >> package > > > >> >> > > > >> >> then it builds spark application jar based on spark paragraph in > > the > > > >> Note. > > > >> >> I think it can be the simplest user interface for the prototype. > > > >> >> > > > >> >> Thanks, > > > >> >> moon > > > >> >> > > > >> >> On Fri, Sep 16, 2016 at 1:11 PM Jeremy Anderson < > > > >> >> jer...@objectadjective.com> > > > >> >> wrote: > > > >> >> > > > >> >> > Luciano, I think this would be a terrific feature. I've heard > the > > > >> exact > > > >> >> > same workflow you've describe in all of the research we've > done. > > > >> >> > > > > >> >> > ........................... > > > >> >> > > > > >> >> > Jeremy Anderson > > > >> >> > Founder, Object Adjective > > > >> >> > 415.493.8489 <(415)%20493-8489> <(415)%20493-8489> > > > >> >> > jer...@objectadjective.com > > > >> >> > objectadjective.com <http://about.me/jeremyanderson> > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > This email and any files transmitted with it are confidential > and > > > >> >> > intended solely for the use of the individual or entity to whom > > > >>they > > > >> are > > > >> >> > addressed. > > > >> >> > > > > >> >> > On 16 September 2016 at 12:19, Luciano Resende > > > >><luckbr1...@gmail.com> > > > >> >> > wrote: > > > >> >> > > > > >> >> > > While talking with a few different users, I have been seeing > > the > > > >>use > > > >> >> case > > > >> >> > > of using iterative development in Notebooks or Spark Shell > and > > > >>then > > > >> >> > copying > > > >> >> > > and pasting the final solution to a formal application > > repeating > > > >> >> itself > > > >> >> > > very often. > > > >> >> > > > > > >> >> > > I was wondering if an "Export Spark Paragraphs as a Spark > > > >> Application > > > >> >> > > (jar)" would be a feature that Zeppelin community would think > > > >>it's > > > >> >> > useful. > > > >> >> > > But keep in mind there are some limitation here : we would be > > > >> >> constrained > > > >> >> > > to Spark related paragraphs, etc... but even so, I think > there > > > >>are > > > >> >> > > multiple scenarios where I see that the ability to have an > > > >> application > > > >> >> > that > > > >> >> > > directly runs on Spark to be very useful. > > > >> >> > > > > > >> >> > > If the community is interested, let's use this thread to > > discuss > > > >>any > > > >> >> > > specific requirements or suggestions that others might have, > > and > > > >> >> after a > > > >> >> > > few days I would like to start prototyping this > functionality. > > > >> >> > > > > > >> >> > > Thoughts ? > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > -- > > > >> >> > > Luciano Resende > > > >> >> > > http://twitter.com/lresende1975 > > > >> >> > > http://lresende.blogspot.com/ > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > >-- > > > >Luciano Resende > > > >http://twitter.com/lresende1975 > > > >http://lresende.blogspot.com/ > > > > > > > > > > > > -- > > Luciano Resende > > http://twitter.com/lresende1975 > > http://lresende.blogspot.com/ > > > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > >