damccorm opened a new issue, #20704:
URL: https://github.com/apache/beam/issues/20704

   Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() 
allows a user to collect failed writes for downstream processing (e.g., sinking 
the records into some kind of deadletter store), could the results of a 
BigQueryIO.read(SerializableFunction) be collected, allowing a user to access 
TableRows that were not able to be parsed by the provided function , for the 
purpose of downstream processing (e.g., some kind of deadletter handling). 
   
   In our use case, all data loaded into our Apache Beam pipeline must meet a 
specified schema, where certain fields are required to be non-null. It would be 
ideal to collect records that do not meet the schema to output them to some 
kind of deadletters store.
   
   Our current implementation requires us to use the slower 
BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to 
parse these TableRows into a custom typed object, outputting any failures to a 
side output for downstream processing. This isn't incredibly cumbersome, but it 
would be a nice feature of the connector itself.
   
   Imported from Jira 
[BEAM-11919](https://issues.apache.org/jira/browse/BEAM-11919). Original Jira 
may contain additional context.
   Reported by: jacquelynwax.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to