Re: Apache Zeppelin Beam Integration

Neville Li Tue, 17 May 2016 10:33:36 -0700

The RDD API is only tied to the SparkInterpreter.

Scio will probably have its own interpreter but with configurable runner so
users can leverage Dataflow (very attractive to us) or Flink also.


On Tue, May 17, 2016 at 11:12 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Ismael,
>
> If 1. is probably the easiest way, I think that it would require some
> changes at Zeppelin side anyway. AFAIK, Zeppelin directly leverages the
> RDD and so it's tight to the Spark API.
>
> So, maybe we will need to change a bit the Zeppelin backend to abstract
> the current RDD usage to PCollection.
>
> My $0.01
>
> Regards
> JB
>
> On 05/17/2016 03:03 PM, Ismaël Mejía wrote:
> > Last week during the Apache Big Data / Apachecon conference i assisted to
> > some
> > presentations and one aspect that surprised me is how Apache Zeppelin was
> > used
> > by many presenters to show their data processing code (mostly in
> > python/scala).
> >
> > I consider that even if this integration is not critical for Apache
> Beam, it
> > is important to support this, and i intend to collaborate in such task. I
> > just created an issue on JIRA for the people interested
> > https://issues.apache.org/jira/browse/BEAM-290
> >
> > I briefly discussed with Alexander Bezzubov from Zeppelin about an
> initial
> > plan
> > to support Beam in three phases:
> >
> > 1. support the scala sdk (scio) + scala runners (spark):
> >
> > This is first since most of the pieces exist already, we just need to put
> > the
> > things together.
> >
> > 2. integrate the java sdk
> >
> > The big issue here is that there is not (yet) a decent java repl tool,
> and
> > the
> > support of such repl in zeppelin is an ongoing work.
> >
> > 3. integrate the python sdk
> >
> > This one depends on the release of the python sdk in the upcoming weeks,
> > and its
> > priority can change if integration is easier than the other two tasks.
> >
> > Of course this message is a call to other interested parties to
> contribute,
> > e.g.
> > ideas, agenda to prioritize certain runners, or other complementary tasks
> > to
> > achieve the goals like integrate scio, support the google storage backend
> > for the
> > notebooks (to make a nicer integration for users of the runner in the
> google
> > cloud), etc.
> >
> > Ismaël Mejía
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Apache Zeppelin Beam Integration

Reply via email to