Niel Markwick created BEAM-6921:
-----------------------------------
Summary: Improve SpannerIO output
Key: BEAM-6921
URL: https://issues.apache.org/jira/browse/BEAM-6921
Project: Beam
Issue Type: New Feature
Components: io-java-gcp
Affects Versions: 2.11.0
Reporter: Niel Markwick
from a discussion in [https://github.com/apache/beam/pull/8097]
SpannerIO produces 2 output PCollections:
* getOutput() -> PCollection<Void>
** never has any values
** in GlobalWindow
** Closed when the input PCollection is closed (ie never in streaming) to
indicate when all input has been written
** Used in batch pipelines to have 'dependant' bulk imports - where one
dataset is not written to Spanner until another has completed writing.
(necessary for handling parent/child (1-many) referential integrity)
* getFailedMutations() -> PCollection<MutationGroup>
** only contains values when Mutation[Group]s fail to be written
** in GlobalWindow
** Not very useful, as the reason for the failure is not given.
Suggestion:
* Deprecate these existing outputs.
* Make primary output be a PCollection<\{ MutationGroup, CommitTimestamp }> so
that the successfully written Mutation[Groups] can be processed further if
necessary.
(\{a,b} signifies a container class for these values)
* Add an additional output of failed mutations PCollection<\{ MutationGroup,
FailureMessage}>
* The existing outputs can be derived from these new outputs
This allows useful error reporting/handling from the failure message, and the
ability to continue processing the successful mutations.
(see also BEAM-6887)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)