I just sent a pull request for adding a bounded source to Beam for reading
distributedlog streams - https://github.com/apache/incubator-beam/pull/1464

Appreciate any review comments.

- KN

On Wed, Aug 31, 2016 at 2:10 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Khurrum,
>
> I already replied in the Jira this morning.
>
> To write the IO, the first question is bounded or unbounded and which
> features you want to provide.
>
> An IO could be a wrapper to a simple DoFn.
>
> If you want provide advanced features like:
> - watermark/skew management for unbounded source
> - estimated size and split for bounded source
> then you can use the Source API.
>
> You can take a look on the existing IO:
> - JMS, Kafka, PubSub for unbounded
> - Bigtable, MongoDB for bounded
>
> We are preparing some documentation on the Beam website about that.
>
> In the mean time, you can take a look on the Dataflow Custom IO
> documentation:
>
> https://cloud.google.com/dataflow/model/custom-io-java
>
> It's basically the same as in Beam.
>
> Anyway, please, let me know, I would be more than happy to help you on
> this !
>
> I'm looking forward working with you on this !
>
> Regards
> JB
>
>
> On 08/31/2016 11:02 AM, Khurrum Nasim wrote:
>
>> Hello beam folks,
>>
>> We are evaluating a new solution to unify our streaming and batching data
>> pipeline, from storage, computing engine to programming model. The idea is
>> basically to implement the Kappa architecture, using DistributedLog as a
>> unified stream store for both streaming and batching, using Flink or Spark
>> (still debating) as the process engine, and using Beam as the programming
>> model.
>>
>> We'd like to contribute an IO connector to DistributedLog (both bounded
>> source/sink and unbounded source/sink).
>>
>> Is there any special instructions or best practise to add a new IO
>> connector? Any suggestion is very appreciated.
>>
>> The jira is here: https://issues.apache.org/jira/browse/BEAM-607
>>
>> Also, /cc the distributed log team for any helps.
>>
>> KN
>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to