Jacquelyn Wax created BEAM-11919:
------------------------------------

             Summary: BigQueryIO.read(SerializableFunction): Collect records 
that could not be successfully parsed into the user-provided custom-typed 
object into a PCollection of TableRows
                 Key: BEAM-11919
                 URL: https://issues.apache.org/jira/browse/BEAM-11919
             Project: Beam
          Issue Type: Wish
          Components: io-java-gcp
            Reporter: Jacquelyn Wax


Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() 
allows a user to collect failed writes for downstream processing (e.g., sinking 
the records into some kind of deadletter store), could the results of a 
BigQueryIO.read(SerializableFunction) be collected, allowing a user to access 
TableRows that were not able to be parsed by the provided function , for the 
purpose of downstream processing (e.g., some kind of deadletter handling). 

In our use case, all data loaded into our Apache Beam pipeline must meet a 
specified schema, where certain fields are required to be non-null. It would be 
ideal to collect records that do not meet the schema to output them to some 
kind of deadletters store.

Our current implementation requires us to use the slower 
BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to 
parse these TableRows into a custom typed object, outputting any failures to a 
side output for downstream processing. This isn't incredibly cumbersome, but it 
would be a nice feature of the connector itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to