Re: Apache Zeppelin Beam Integration

Ismaël Mejía Tue, 17 May 2016 06:30:06 -0700

You are right Neville, my idea is to use the scio-repl to offer such
semi-interactiveness, at least to be able to show/execute code snippets,
but I
agree that the experience probably won't be exactly the same of the spark
interpreter, however the different runner support is for me worth the
effort.


Another aspect that could be interesting too and that I have not explored
at all
is to view how we can integrate real-time/unbounded data pipelines that I
imagine displayed as 'dashboards'.

Anyway, I am probably missing some technical details (no doubt), so any
feedback you or the others can give me is more than welcome.

On Tue, May 17, 2016 at 3:16 PM, Neville Li <[email protected]> wrote:

> The biggest appeal of spark in zeppelin is its interactiveness, i.e. the
> ability to pull data from RDDs to the driver/web UI via actions (take,
> collect, top).
> There are no equivalent of actions in Beam/Dataflow, only transformations
> (apply(transform)). How's that gonna work with spark?
>
> In scio-repl we have semi-interactiveness, i.e. each context corresponds to
> a Dataflow job but you have to close the context before collecting data
> back to the REPL with Future.
>
> On Tue, May 17, 2016 at 9:03 AM Ismaël Mejía <[email protected]> wrote:
>
> > Last week during the Apache Big Data / Apachecon conference i assisted to
> > some
> > presentations and one aspect that surprised me is how Apache Zeppelin was
> > used
> > by many presenters to show their data processing code (mostly in
> > python/scala).
> >
> > I consider that even if this integration is not critical for Apache Beam,
> > it
> > is important to support this, and i intend to collaborate in such task. I
> > just created an issue on JIRA for the people interested
> > https://issues.apache.org/jira/browse/BEAM-290
> >
> > I briefly discussed with Alexander Bezzubov from Zeppelin about an
> initial
> > plan
> > to support Beam in three phases:
> >
> > 1. support the scala sdk (scio) + scala runners (spark):
> >
> > This is first since most of the pieces exist already, we just need to put
> > the
> > things together.
> >
> > 2. integrate the java sdk
> >
> > The big issue here is that there is not (yet) a decent java repl tool,
> and
> > the
> > support of such repl in zeppelin is an ongoing work.
> >
> > 3. integrate the python sdk
> >
> > This one depends on the release of the python sdk in the upcoming weeks,
> > and its
> > priority can change if integration is easier than the other two tasks.
> >
> > Of course this message is a call to other interested parties to
> contribute,
> > e.g.
> > ideas, agenda to prioritize certain runners, or other complementary tasks
> > to
> > achieve the goals like integrate scio, support the google storage backend
> > for the
> > notebooks (to make a nicer integration for users of the runner in the
> > google
> > cloud), etc.
> >
> > Ismaël Mejía
> >
>

Re: Apache Zeppelin Beam Integration

Reply via email to