Producer)

Davor Bonaci Thu, 28 Apr 2016 10:09:48 -0700

[ Moving over to the dev@ list ]

I think we should be aiming a little higher than "trying out Beam" ;)


Beam SDK currently has built-in IOs for Kafka, as well as for all important
Google Cloud Platform services. Additionally, there are pull requests for
Firebase and Cassandra. This is not bad, particularly talking into account
that we have APIs for user to develop their own IO connectors. Of course,
there's a long way to go, but there should *not* be any users that are
blocked or scenarios that are impossible.

In terms of the runner support, Cloud Dataflow runner supports all IOs,
including any user-written ones. Other runners don't as extensively, but
this is a high priority item to address.

In my mind, we should strive to address the following:

   - Complete conversion of existing IOs to the Source / Sink API. ETA: a
   week or two for full completion.
   - Make sure Spark & Flink runners fully support Source / Sink API, and
   that ties into the new Runner / Fn API discussion.
   - Increase the set of built-in IOs. No ETA; iterative process over time.
   There are 2 pending pull requests, others in development.

I'm hopeful we can address all of these items in a relatively short period
of time -- in a few months or so -- and likely before we can call any
release "stable". (This is why the new Runner / Fn API discussions are so
important.)

In summary, in my mind, "long run" here means "< few months".

---------- Forwarded message ----------
From: Maximilian Michels <m...@apache.org>
Date: Thu, Apr 28, 2016 at 3:20 AM
Subject: Re: How to read/write avro data using FlinkKafka Consumer/Producer
(Beam Pipeline) ?
To: u...@beam.incubator.apache.org

On Wed, Apr 27, 2016 at 11:12 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:
> generally speaking, we have to check that all runners work fine with the
provided IO. I don't think it's a good idea that the runners themselves
implement any IO: they should use "out of the box" IO.

In the long run, big yes and I liked to help to make it possible!
However, there is still a gap between what Beam and its Runners
provide and what users want to do. For the time being, I think the
solution we have is fine. It gives users the option to try out Beam
with sources and sinks that they expect to be available in streaming
systems.

IO timelines (Was: How to read/write avro data using FlinkKafka Consumer/Producer)

Reply via email to