Re: [DISCUSS] Zeppelin Context for Flink

Trevor Grant Sat, 23 Apr 2016 07:03:06 -0700

Ahh, yes, that is what I mean- I want syntactic sugar (or docs ;) ).

And the .putResource / getResource available to all interpreters. Common
API things that are handled commonly (instead of one-off in each
interpreter).


I think there is still some value to doing something similar with the
interpreters, (e.g. line parsing in scala works basically the same across
all.. https://github.com/apache/incubator-zeppelin/pull/794)

But I'm not going to be the one to fix it, and it's not really causing me
any real trouble right now.

Thanks DuyHai, and all.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sat, Apr 23, 2016 at 3:05 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> One idea
>
> Once we merge AngularObjectRegistry with ResourcePool, it will be a good
> idea to expose some utility methods like 'getResource(xxx)',
> 'putResource(yyy)' and 'removeResource(zzz)' directly on the
> InterpreterContext object so that any interpreter can use them
>
>
>
> On Sat, Apr 23, 2016 at 9:59 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> > "I'd like to see that Flink have access to the 'z' object. "
> >
> > --> You're taking the problem at the wrong side.
> >
> > You need to access the 'z' object not for the object itself but to be
> able
> > to call its functions, namely 'z.angular(xxx)' right ?
> >
> > If you look at the source code, the AngularObjectRegistry is available
> > from the InterpreterContext object itself, with a little bit
> > of code, see here:
> >
> https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java#L370-L384
> >
> > So basically, inside the Flink interpreter, you can as well call this
> > piece of code and achieve the same goal
> >
> > The 'z.angular()' method is merely a syntactic sugar method to simplify
> > AngularObjectRegistry interaction
> >
> > "But the Angular binds don't need to be Spark specific (e.g. living in
> > the ZeppelinContext which requires a SparkContext as a constructor)."
> >
> > --> And it isn't Spark specific, it can be retrieve from
> > InterpreterContext itself
> >
> >
> > On Sat, Apr 23, 2016 at 12:27 AM, Trevor Grant <trevor.d.gr...@gmail.com
> >
> > wrote:
> >
> >> First of all, awesome work on what you've done here.  Appreciating it
> more
> >> and more, the more I grok.
> >>
> >> Second of all, thank for the Cassandra snippit. I realized we are
> talking
> >> about slightly different things.
> >> You are talking about ${var}
> >>
> >> I wanted something closer to this:
> >>
> >> %flink
> >> import org.apache.zeppelin.interpreter.InterpreterContext
> >> val resourcePool = InterpreterContext.get().getResourcePool()
> >> resourcePool.put("foo", "bar")
> >>
> >> import org.apache.zeppelin.interpreter.InterpreterContext
> >> resourcePool: org.apache.zeppelin.resource.ResourcePool =
> >> org.apache.zeppelin.resource.DistributedResourcePool@21d07d88
> >>
> >> ----------------------------------
> >> %spark z.get("foo")
> >>
> >> res4: Object = bar
> >>
> >> ^^ This actually works, so I can move on with my day.
> >>
> >> Continuing the discussion:
> >>
> >> I'd like to see that Flink have access to the 'z' object.  OR, if that
> is
> >> deprecated- I hope to see something calling this out in your PR of
> >> documentation. E.g. using resource pools. I'm not a complete idiot, but
> it
> >> took me some time to dig through code to figure this one out (and
> comments
> >> of this thread).  I think variable passing is one of the coolest things
> of
> >> a Zeppelin setup.  People should be aware that it's a thing and how to
> do.
> >>
> >> Re: Zeppelin being Spark Centric. I say that because the zeppelin
> context
> >> is really wrapped up in the Spark interpreter and vice versa. For cripes
> >> sake, the Spark Context is required for the constructor of the Zeppelin
> >> Context:
> >> (This isn't related to your pull request / fine work)
> >>
> >> Currently it is something like this:
> >>
> >> class SparkInterpreter {
> >>    // basic interpreter stuff
> >>    // fancy interpreter fixes
> >>    // special Zeppelin interpreter magic
> >> }
> >>
> >> class ZeppelinContext( SparkContext ) {
> >>   // all the binding / watching / other cool stuff
> >> }
> >>
> >> class FlinkInterpreter {
> >>    // basic interpreter stuff
> >> }
> >>
> >> class IgniteInterpreter {
> >>    // basic interpreter stuff, but not standardized so patches and fixes
> >> don't always work as expected and now all interpretters have slightly
> >> different implementation bc they aren't homogenized.
> >> }
> >>
> >>
> >> I propose something more like this:
> >> class ZeppelinIntp {
> >>    // common resource pools
> >>    // etc
> >> }
> >> object ZeppelinIntp {
> >>     // common resource pools
> >> }
> >>
> >> class ScalaIntp {
> >>   // everything for a well oiled and highly functioning scala
> interpreter
> >> }
> >>
> >> object SparkScalaIntp extends ScalaIntp (sparkParams, ZeppelinIntp,
> ...){
> >>     // do spark specific things
> >> }
> >>
> >> object FlinkScalaIntp extends ScalaIntp (flinkParams, ZeppelinIntp,
> ...){
> >>     // do flink specific things
> >> }
> >>
> >> object IgniteScalaIntp extends ScalaIntp (igniteParams, ZeppelinIntp,
> >> ...){
> >>     // do ignite specific things
> >> }
> >>
> >> Yea, I know this is a major refactor, but the problem is going to get
> >> worse
> >> as time goes on.
> >>
> >> The zeppelin context-spark context may not be worth splitting out- those
> >> two are really entangled, and for any concievable case the most we would
> >> want to pass back and forth can be handled by the resource pools. But
> the
> >> Angular binds don't need to be Spark specific (e.g. living in the
> >> ZeppelinContext which requires a SparkContext as a constructor). If
> >> anything it would make more sense for those to live inside Flink bc it
> is
> >> true streaming as opposed to Spark Mini-batching (which comes to the
> >> scala-shell in v1.1).
> >>
> >> Also, I really believe the over arching classes that handle language
> >> behavior and parsing ought to be off in their own modules.
> >>
> >> Possibly a thing for v 0.7?
> >>
> >>
> >>
> >>
> >>
> >> Trevor Grant
> >> Data Scientist
> >> https://github.com/rawkintrevo
> >> http://stackexchange.com/users/3002022/rawkintrevo
> >> http://trevorgrant.org
> >>
> >> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >>
> >>
> >> On Fri, Apr 22, 2016 at 4:37 PM, DuyHai Doan <doanduy...@gmail.com>
> >> wrote:
> >>
> >> > "Back to my original post, I essentially want to add Flink to that
> list"
> >> >
> >> > In that case, inside the Flink interpreter source code, everytime the
> >> input
> >> > parser encouters a ${variable} pattern, you have to access the
> >> > AngularObjectRegistry and replace the template by the actual variable
> >> > value.
> >> >
> >> > It is the responsibility for each interpreter to implement variable
> >> > interpolation (${var})
> >> >
> >> > I did it for the Cassandra interpreter using my own syntax ( {{var}}
> ) :
> >> >
> >> >
> >>
> https://github.com/apache/incubator-zeppelin/blob/master/cassandra/src/main/scala/org/apache/zeppelin/cassandra/InterpreterLogic.scala#L306-L327
> >> >
> >> >
> >> > "was looking through your resourcePools. I am under the impression I
> can
> >> > use
> >> > those to pass a variable from one paragraph to another, in an akward
> >> sort
> >> > of fasion (but I may be going about it all wrong). Supposing that can
> be
> >> > done (or possibly is already done, but I haven't read the PRs you
> >> > listed carefully),
> >> > it would solve what I want to do for the time being."
> >> >
> >> > I will create an epic to merge angular object with resource pools to
> >> keep
> >> > only one abstraction. But it doesn't solve the fundamental problem,
> >> which
> >> > is IF an interpreter wants to use variables stored in resource pool,
> it
> >> HAS
> >> > to implement it.
> >> >
> >> > The only way we can mutualise code for variable binding is to let
> >> Zeppelin
> >> > Engine pre-process the input text bloc of each paragraph and perform
> >> > variable lookup from Resource Pool then variable replace, and after
> that
> >> > forward the text block to the interpreter itself.
> >> >
> >> > I think it is a good idea but it would require some refactoring and
> may
> >> > break existing behaviors if some interpreter already implemented their
> >> own
> >> > variable template handling
> >> >
> >> >
> >> >
> >> > "2) If we want to keep the code base compact and clean, would it be
> >> wiser
> >> > to refactor in a less Spark-centric way?"
> >> >
> >> > There is nothing Spark centric here if we're talking about variable
> >> > sharing, it applies to all interpreters
> >> >
> >> >
> >> > On Fri, Apr 22, 2016 at 11:24 PM, Trevor Grant <
> >> trevor.d.gr...@gmail.com>
> >> > wrote:
> >> >
> >> > > If I'm reading https://issues.apache.org/jira/browse/ZEPPELIN-635
> >> > > correctly- this integrates the spark, markdown, and shell
> >> interpreters.
> >> > >
> >> > > Back to my original post, I essentially want to add Flink to that
> >> list.
> >> > >
> >> > > To your point about keeping a small and managable code-base:  Under
> >> the
> >> > > hood it seems like Zeppelin is a front end for Spark and oh btw,
> here
> >> are
> >> > > some hacks to make other stuff work too.  For instance there is a
> lot
> >> of
> >> > > code reusage in any scala based interpreter.  Wouldn't it make more
> >> sense
> >> > > to have a generic Scala interpreter and extend it for special quirks
> >> of
> >> > > each interpreter as needed, e.g. for the variable bindings of the
> >> > > particular interpreter, and loading configurations.  Consider the
> >> > companion
> >> > > object bug, essentially the same code had to be copy and pasted
> >> across 4
> >> > > interpreters, and the Ignite interpreter (as I recall) never even
> got
> >> the
> >> > > fix because of a quirk in the way the tests are written for that
> >> > > interpreter.
> >> > >
> >> > > I was looking through your resourcePools. I am under the impression
> I
> >> can
> >> > > use those to pass a variable from one paragraph to another, in an
> >> akward
> >> > > sort of fasion (but I may be going about it all wrong). Supposing
> that
> >> > can
> >> > > be done (or possibly is already done, but I haven't read the PRs you
> >> > listed
> >> > > carefully), it would solve what I want to do for the time being.
> >> > >
> >> > > Also consider the Python Flink I want to add to this, there will
> once
> >> > again
> >> > > be a lot of duplication of code from the Spark Python interpreter.
> A
> >> > > generic Python interpreter also seems like a more reasonable
> approach
> >> > here.
> >> > >
> >> > > So basically I've broken this conversation into two parts-
> >> > > 1) I'm trying to pass variables/object back and forth between
> >> > > Spark/Flink/Angular/etc. Please help. Seems possible but I'm having
> a
> >> > slow
> >> > > time figuring it out
> >> > > 2) If we want to keep the code base compact and clean, would it be
> >> wiser
> >> > to
> >> > > refactor in a less Spark-centric way?
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Trevor Grant
> >> > > Data Scientist
> >> > > https://github.com/rawkintrevo
> >> > > http://stackexchange.com/users/3002022/rawkintrevo
> >> > > http://trevorgrant.org
> >> > >
> >> > > *"Fortunate is he, who is able to know the causes of things."
> >> -Virgil*
> >> > >
> >> > >
> >> > > On Fri, Apr 22, 2016 at 3:41 PM, DuyHai Doan <doanduy...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > In this case, it is already implemented.
> >> > > >
> >> > > > Look at those merged PR:
> >> > > >
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/739
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/740
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/741
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/742
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/744
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/745
> >> > > > - https://github.com/apache/incubator-zeppelin/pull/832
> >> > > >
> >> > > > There is one last JIRA pending for documentation, I'll do a PR for
> >> this
> >> > > > next week: https://issues.apache.org/jira/browse/ZEPPELIN-742
> >> > > >
> >> > > > On Fri, Apr 22, 2016 at 9:52 PM, Trevor Grant <
> >> > trevor.d.gr...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I want to be able to put/get/watch variables. Specifically so I
> >> can
> >> > > > > interface with AngularJS for visualizations.
> >> > > > >
> >> > > > > I've been groking the codebase trying to find a less invasive
> way
> >> to
> >> > do
> >> > > > > this.
> >> > > > >
> >> > > > > I get wanting to keep the code base clean but sharing variables
> >> is a
> >> > > > really
> >> > > > > nice feature set and shouldn't be that hard to implement?
> >> > > > >
> >> > > > > Thoughts?
> >> > > > >
> >> > > > > Trevor Grant
> >> > > > > Data Scientist
> >> > > > > https://github.com/rawkintrevo
> >> > > > > http://stackexchange.com/users/3002022/rawkintrevo
> >> > > > > http://trevorgrant.org
> >> > > > >
> >> > > > > *"Fortunate is he, who is able to know the causes of things."
> >> > -Virgil*
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Apr 22, 2016 at 1:06 PM, DuyHai Doan <
> >> doanduy...@gmail.com>
> >> > > > wrote:
> >> > > > >
> >> > > > > > I think we should rather let ZeppelinContext un-modified.
> >> > > > > >
> >> > > > > > If we update ZeppelinContext for every kind of interpreter, it
> >> > would
> >> > > > > become
> >> > > > > > quickly a behemoth and un-manageable.
> >> > > > > >
> >> > > > > > The reason ZeppelinContext has some support for Spark is
> because
> >> > it's
> >> > > > > > historical. Now that the project is going to gain wider
> >> audience,
> >> > we
> >> > > > > should
> >> > > > > > focus on keeping the code as cleanest and as modular as
> >> possible.
> >> > > > > >
> >> > > > > > Can you explain which feature you want to add to
> ZeppelinContext
> >> > that
> >> > > > > will
> >> > > > > > be useful for Flink ?
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Apr 22, 2016 at 7:12 PM, Trevor Grant <
> >> > > > trevor.d.gr...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > If one were to extend the Zeppelin context for Flink, I was
> >> > > thinking
> >> > > > it
> >> > > > > > > would make the most sense to update
> >> > > > > > >
> >> > > > > > >
> >> > >
> ../spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java
> >> > > > > > >
> >> > > > > > > Any thoughts from those who are more familiar with that end
> of
> >> > the
> >> > > > code
> >> > > > > > > base than I?
> >> > > > > > >
> >> > > > > > > Ideally we'd have a solution that extend the Zeppelin
> Context
> >> to
> >> > > all
> >> > > > > > > interpreters.  I know y'all love Spark but there ARE others
> >> out
> >> > > > > there...
> >> > > > > > >
> >> > > > > > > Anyone have any branches / previous attempts I could check
> >> out?
> >> > > > > > >
> >> > > > > > > tg
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Trevor Grant
> >> > > > > > > Data Scientist
> >> > > > > > > https://github.com/rawkintrevo
> >> > > > > > > http://stackexchange.com/users/3002022/rawkintrevo
> >> > > > > > > http://trevorgrant.org
> >> > > > > > >
> >> > > > > > > *"Fortunate is he, who is able to know the causes of
> things."
> >> > > > -Virgil*
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] Zeppelin Context for Flink

Reply via email to