[
https://issues.apache.org/jira/browse/BEAM-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105942#comment-16105942
]
Eugene Kirpichov commented on BEAM-92:
--------------------------------------
I'm not sure whether having the current JIRA still makes sense. The PRs that
landed on this JIRA are not about a generic data-dependent sink API, they are
just examples of this idea applied to different IOs. I doubt that a complete
generalization is possible, so I think we should close this JIRA and file more
concrete ones for the generalizations that are possible (e.g. unify the
DynamicDestinations APIs between files and bigquery, come up with best
practices for data-dependent IOs, etc.).
> Data-dependent sinks
> --------------------
>
> Key: BEAM-92
> URL: https://issues.apache.org/jira/browse/BEAM-92
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Reuven Lax
>
> Current sink API writes all data to a single destination, but there are many
> use cases where different pieces of data need to be routed to different
> destinations where the set of destinations is data-dependent (so can't be
> implemented with a Partition transform).
> One internally discussed proposal was an API of the form:
> {code}
> PCollection<Void> PCollection<T>.apply(
> Write.using(DoFn<T, SinkT> where,
> MapFn<SinkT, WriteOperation<WriteResultT, T>> how)
> {code}
> so an item T gets written to a destination (or multiple destinations)
> determined by "where"; and the writing strategy is determined by "how" that
> produces a WriteOperation (current API - global init/write/global finalize
> hooks) for any given destination.
> This API also has other benefits:
> * allows the SinkT to be computed dynamically (in "where"), rather than
> specified at pipeline construction time
> * removes the necessity for a Sink class entirely
> * is sequenceable w.r.t. downstream transforms (you can stick transforms onto
> the returned PCollection<Void>, while the current Write.to() returns a PDone)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)