Re: [DISCUSS] Zeppelin Context for Flink

Trevor Grant Fri, 22 Apr 2016 15:28:13 -0700

First of all, awesome work on what you've done here.  Appreciating it more
and more, the more I grok.


Second of all, thank for the Cassandra snippit. I realized we are talking
about slightly different things.
You are talking about ${var}

I wanted something closer to this:

%flink
import org.apache.zeppelin.interpreter.InterpreterContext
val resourcePool = InterpreterContext.get().getResourcePool()
resourcePool.put("foo", "bar")

import org.apache.zeppelin.interpreter.InterpreterContext
resourcePool: org.apache.zeppelin.resource.ResourcePool =
org.apache.zeppelin.resource.DistributedResourcePool@21d07d88

----------------------------------
%spark z.get("foo")

res4: Object = bar

^^ This actually works, so I can move on with my day.

Continuing the discussion:

I'd like to see that Flink have access to the 'z' object.  OR, if that is
deprecated- I hope to see something calling this out in your PR of
documentation. E.g. using resource pools. I'm not a complete idiot, but it
took me some time to dig through code to figure this one out (and comments
of this thread).  I think variable passing is one of the coolest things of
a Zeppelin setup.  People should be aware that it's a thing and how to do.

Re: Zeppelin being Spark Centric. I say that because the zeppelin context
is really wrapped up in the Spark interpreter and vice versa. For cripes
sake, the Spark Context is required for the constructor of the Zeppelin
Context:
(This isn't related to your pull request / fine work)

Currently it is something like this:

class SparkInterpreter {
   // basic interpreter stuff
   // fancy interpreter fixes
   // special Zeppelin interpreter magic
}

class ZeppelinContext( SparkContext ) {
  // all the binding / watching / other cool stuff
}

class FlinkInterpreter {
   // basic interpreter stuff
}

class IgniteInterpreter {
   // basic interpreter stuff, but not standardized so patches and fixes
don't always work as expected and now all interpretters have slightly
different implementation bc they aren't homogenized.
}


I propose something more like this:
class ZeppelinIntp {
   // common resource pools
   // etc
}
object ZeppelinIntp {
    // common resource pools
}

class ScalaIntp {
  // everything for a well oiled and highly functioning scala interpreter
}

object SparkScalaIntp extends ScalaIntp (sparkParams, ZeppelinIntp, ...){
    // do spark specific things
}

object FlinkScalaIntp extends ScalaIntp (flinkParams, ZeppelinIntp, ...){
    // do flink specific things
}

object IgniteScalaIntp extends ScalaIntp (igniteParams, ZeppelinIntp, ...){
    // do ignite specific things
}

Yea, I know this is a major refactor, but the problem is going to get worse
as time goes on.

The zeppelin context-spark context may not be worth splitting out- those
two are really entangled, and for any concievable case the most we would
want to pass back and forth can be handled by the resource pools. But the
Angular binds don't need to be Spark specific (e.g. living in the
ZeppelinContext which requires a SparkContext as a constructor). If
anything it would make more sense for those to live inside Flink bc it is
true streaming as opposed to Spark Mini-batching (which comes to the
scala-shell in v1.1).

Also, I really believe the over arching classes that handle language
behavior and parsing ought to be off in their own modules.

Possibly a thing for v 0.7?





Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Apr 22, 2016 at 4:37 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> "Back to my original post, I essentially want to add Flink to that list"
>
> In that case, inside the Flink interpreter source code, everytime the input
> parser encouters a ${variable} pattern, you have to access the
> AngularObjectRegistry and replace the template by the actual variable
> value.
>
> It is the responsibility for each interpreter to implement variable
> interpolation (${var})
>
> I did it for the Cassandra interpreter using my own syntax ( {{var}} ) :
>
> https://github.com/apache/incubator-zeppelin/blob/master/cassandra/src/main/scala/org/apache/zeppelin/cassandra/InterpreterLogic.scala#L306-L327
>
>
> "was looking through your resourcePools. I am under the impression I can
> use
> those to pass a variable from one paragraph to another, in an akward sort
> of fasion (but I may be going about it all wrong). Supposing that can be
> done (or possibly is already done, but I haven't read the PRs you
> listed carefully),
> it would solve what I want to do for the time being."
>
> I will create an epic to merge angular object with resource pools to keep
> only one abstraction. But it doesn't solve the fundamental problem, which
> is IF an interpreter wants to use variables stored in resource pool, it HAS
> to implement it.
>
> The only way we can mutualise code for variable binding is to let Zeppelin
> Engine pre-process the input text bloc of each paragraph and perform
> variable lookup from Resource Pool then variable replace, and after that
> forward the text block to the interpreter itself.
>
> I think it is a good idea but it would require some refactoring and may
> break existing behaviors if some interpreter already implemented their own
> variable template handling
>
>
>
> "2) If we want to keep the code base compact and clean, would it be wiser
> to refactor in a less Spark-centric way?"
>
> There is nothing Spark centric here if we're talking about variable
> sharing, it applies to all interpreters
>
>
> On Fri, Apr 22, 2016 at 11:24 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
> > If I'm reading https://issues.apache.org/jira/browse/ZEPPELIN-635
> > correctly- this integrates the spark, markdown, and shell interpreters.
> >
> > Back to my original post, I essentially want to add Flink to that list.
> >
> > To your point about keeping a small and managable code-base:  Under the
> > hood it seems like Zeppelin is a front end for Spark and oh btw, here are
> > some hacks to make other stuff work too.  For instance there is a lot of
> > code reusage in any scala based interpreter.  Wouldn't it make more sense
> > to have a generic Scala interpreter and extend it for special quirks of
> > each interpreter as needed, e.g. for the variable bindings of the
> > particular interpreter, and loading configurations.  Consider the
> companion
> > object bug, essentially the same code had to be copy and pasted across 4
> > interpreters, and the Ignite interpreter (as I recall) never even got the
> > fix because of a quirk in the way the tests are written for that
> > interpreter.
> >
> > I was looking through your resourcePools. I am under the impression I can
> > use those to pass a variable from one paragraph to another, in an akward
> > sort of fasion (but I may be going about it all wrong). Supposing that
> can
> > be done (or possibly is already done, but I haven't read the PRs you
> listed
> > carefully), it would solve what I want to do for the time being.
> >
> > Also consider the Python Flink I want to add to this, there will once
> again
> > be a lot of duplication of code from the Spark Python interpreter.  A
> > generic Python interpreter also seems like a more reasonable approach
> here.
> >
> > So basically I've broken this conversation into two parts-
> > 1) I'm trying to pass variables/object back and forth between
> > Spark/Flink/Angular/etc. Please help. Seems possible but I'm having a
> slow
> > time figuring it out
> > 2) If we want to keep the code base compact and clean, would it be wiser
> to
> > refactor in a less Spark-centric way?
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Fri, Apr 22, 2016 at 3:41 PM, DuyHai Doan <doanduy...@gmail.com>
> wrote:
> >
> > > In this case, it is already implemented.
> > >
> > > Look at those merged PR:
> > >
> > > - https://github.com/apache/incubator-zeppelin/pull/739
> > > - https://github.com/apache/incubator-zeppelin/pull/740
> > > - https://github.com/apache/incubator-zeppelin/pull/741
> > > - https://github.com/apache/incubator-zeppelin/pull/742
> > > - https://github.com/apache/incubator-zeppelin/pull/744
> > > - https://github.com/apache/incubator-zeppelin/pull/745
> > > - https://github.com/apache/incubator-zeppelin/pull/832
> > >
> > > There is one last JIRA pending for documentation, I'll do a PR for this
> > > next week: https://issues.apache.org/jira/browse/ZEPPELIN-742
> > >
> > > On Fri, Apr 22, 2016 at 9:52 PM, Trevor Grant <
> trevor.d.gr...@gmail.com>
> > > wrote:
> > >
> > > > I want to be able to put/get/watch variables. Specifically so I can
> > > > interface with AngularJS for visualizations.
> > > >
> > > > I've been groking the codebase trying to find a less invasive way to
> do
> > > > this.
> > > >
> > > > I get wanting to keep the code base clean but sharing variables is a
> > > really
> > > > nice feature set and shouldn't be that hard to implement?
> > > >
> > > > Thoughts?
> > > >
> > > > Trevor Grant
> > > > Data Scientist
> > > > https://github.com/rawkintrevo
> > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > http://trevorgrant.org
> > > >
> > > > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > > >
> > > >
> > > > On Fri, Apr 22, 2016 at 1:06 PM, DuyHai Doan <doanduy...@gmail.com>
> > > wrote:
> > > >
> > > > > I think we should rather let ZeppelinContext un-modified.
> > > > >
> > > > > If we update ZeppelinContext for every kind of interpreter, it
> would
> > > > become
> > > > > quickly a behemoth and un-manageable.
> > > > >
> > > > > The reason ZeppelinContext has some support for Spark is because
> it's
> > > > > historical. Now that the project is going to gain wider audience,
> we
> > > > should
> > > > > focus on keeping the code as cleanest and as modular as possible.
> > > > >
> > > > > Can you explain which feature you want to add to ZeppelinContext
> that
> > > > will
> > > > > be useful for Flink ?
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 22, 2016 at 7:12 PM, Trevor Grant <
> > > trevor.d.gr...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > If one were to extend the Zeppelin context for Flink, I was
> > thinking
> > > it
> > > > > > would make the most sense to update
> > > > > >
> > > > > >
> > ../spark/src/main/java/org/apache/zeppelin/spark/ZeppelinContext.java
> > > > > >
> > > > > > Any thoughts from those who are more familiar with that end of
> the
> > > code
> > > > > > base than I?
> > > > > >
> > > > > > Ideally we'd have a solution that extend the Zeppelin Context to
> > all
> > > > > > interpreters.  I know y'all love Spark but there ARE others out
> > > > there...
> > > > > >
> > > > > > Anyone have any branches / previous attempts I could check out?
> > > > > >
> > > > > > tg
> > > > > >
> > > > > >
> > > > > > Trevor Grant
> > > > > > Data Scientist
> > > > > > https://github.com/rawkintrevo
> > > > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > > > http://trevorgrant.org
> > > > > >
> > > > > > *"Fortunate is he, who is able to know the causes of things."
> > > -Virgil*
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Zeppelin Context for Flink

Reply via email to