Niel Markwick created BEAM-6921:
-----------------------------------

             Summary: Improve SpannerIO output
                 Key: BEAM-6921
                 URL: https://issues.apache.org/jira/browse/BEAM-6921
             Project: Beam
          Issue Type: New Feature
          Components: io-java-gcp
    Affects Versions: 2.11.0
            Reporter: Niel Markwick


from a discussion in [https://github.com/apache/beam/pull/8097]
 SpannerIO produces 2 output PCollections:
 * getOutput() -> PCollection<Void>
 ** never has any values
 ** in GlobalWindow
 ** Closed when the input PCollection is closed (ie never in streaming) to 
indicate when all input has been written
 ** Used in batch pipelines to have 'dependant' bulk imports - where one 
dataset is not written to Spanner until another has completed writing. 
(necessary for handling parent/child (1-many) referential integrity)
 * getFailedMutations() -> PCollection<MutationGroup>
 ** only contains values when Mutation[Group]s fail to be written
 ** in GlobalWindow
 ** Not very useful, as the reason for the failure is not given. 

Suggestion: 
 * Deprecate these existing outputs.
 * Make primary output be a PCollection<\{ MutationGroup, CommitTimestamp }> so 
that the successfully written Mutation[Groups] can be processed further if 
necessary.
(\{a,b} signifies a container class for these values)
 * Add an additional output of failed mutations PCollection<\{ MutationGroup, 
FailureMessage}>
 * The existing outputs can be derived from these new outputs

This allows useful error reporting/handling from the failure message, and the 
ability to continue processing the successful mutations. 

 

(see also BEAM-6887)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to