[
https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davor Bonaci updated BEAM-14:
-----------------------------
Component/s: beam-model
> Add data integration DSL
> ------------------------
>
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
> Issue Type: New Feature
> Components: beam-model
> Reporter: Jean-Baptiste Onofré
> Assignee: Jean-Baptiste Onofré
>
> Even if users would still be able to use directly the API, it would be great
> to provide a DSL on top of the API covering batch and streaming data
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function
> (DoFn), we can provide a fluent DSL allowing users to directly leverage
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo&acks=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the
> execution of a target pipeline: (from("trigger:other") triggering the
> pipeline execution when another pipeline design starts with
> pipeline("other")). We can also imagine to mix the runners: the pipeline()
> can be on one runner, the from("trigger:other") can be on another runner).
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data
> integration support to bean in addition of data processing. Data Integration
> is an extension of Beam API to support some Enterprise Integration Patterns
> (EIPs). As we would need metadata for data integration (even if metadata can
> also be interesting in stream/batch data processing pipeline), we can provide
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source
> (for instance the partition/path where the data comes from, …).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)