I just sent a pull request for adding a bounded source to Beam for reading distributedlog streams - https://github.com/apache/incubator-beam/pull/1464
Appreciate any review comments. - KN On Wed, Aug 31, 2016 at 2:10 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi Khurrum, > > I already replied in the Jira this morning. > > To write the IO, the first question is bounded or unbounded and which > features you want to provide. > > An IO could be a wrapper to a simple DoFn. > > If you want provide advanced features like: > - watermark/skew management for unbounded source > - estimated size and split for bounded source > then you can use the Source API. > > You can take a look on the existing IO: > - JMS, Kafka, PubSub for unbounded > - Bigtable, MongoDB for bounded > > We are preparing some documentation on the Beam website about that. > > In the mean time, you can take a look on the Dataflow Custom IO > documentation: > > https://cloud.google.com/dataflow/model/custom-io-java > > It's basically the same as in Beam. > > Anyway, please, let me know, I would be more than happy to help you on > this ! > > I'm looking forward working with you on this ! > > Regards > JB > > > On 08/31/2016 11:02 AM, Khurrum Nasim wrote: > >> Hello beam folks, >> >> We are evaluating a new solution to unify our streaming and batching data >> pipeline, from storage, computing engine to programming model. The idea is >> basically to implement the Kappa architecture, using DistributedLog as a >> unified stream store for both streaming and batching, using Flink or Spark >> (still debating) as the process engine, and using Beam as the programming >> model. >> >> We'd like to contribute an IO connector to DistributedLog (both bounded >> source/sink and unbounded source/sink). >> >> Is there any special instructions or best practise to add a new IO >> connector? Any suggestion is very appreciated. >> >> The jira is here: https://issues.apache.org/jira/browse/BEAM-607 >> >> Also, /cc the distributed log team for any helps. >> >> KN >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
