Kenneth Knowles updated BEAM-14:
Component/s: (was: sdk-java-extensions)
> Add declarative DSLs (XML & JSON)
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
> Issue Type: New Feature
> Components: sdk-ideas
> Reporter: Jean-Baptiste Onofré
> Assignee: Sajeevan Achuthan
> Priority: Major
> Even if users would still be able to use directly the API, it would be great
> to provide a DSL on top of the API covering batch and streaming data
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function
> (DoFn), we can provide a fluent DSL allowing users to directly leverage
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> The DSL will allow to use existing pipelines, for instance:
> So it means that we will have to create a IO Sink that can trigger the
> execution of a target pipeline: (from("trigger:other") triggering the
> pipeline execution when another pipeline design starts with
> pipeline("other")). We can also imagine to mix the runners: the pipeline()
> can be on one runner, the from("trigger:other") can be on another runner).
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data
> integration support to bean in addition of data processing. Data Integration
> is an extension of Beam API to support some Enterprise Integration Patterns
> (EIPs). As we would need metadata for data integration (even if metadata can
> also be interesting in stream/batch data processing pipeline), we can provide
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source
> (for instance the partition/path where the data comes from, …).
This message was sent by Atlassian JIRA