Jacquelyn Wax created BEAM-11919:
------------------------------------
Summary: BigQueryIO.read(SerializableFunction): Collect records
that could not be successfully parsed into the user-provided custom-typed
object into a PCollection of TableRows
Key: BEAM-11919
URL: https://issues.apache.org/jira/browse/BEAM-11919
Project: Beam
Issue Type: Wish
Components: io-java-gcp
Reporter: Jacquelyn Wax
Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts()
allows a user to collect failed writes for downstream processing (e.g., sinking
the records into some kind of deadletter store), could the results of a
BigQueryIO.read(SerializableFunction) be collected, allowing a user to access
TableRows that were not able to be parsed by the provided function , for the
purpose of downstream processing (e.g., some kind of deadletter handling).
In our use case, all data loaded into our Apache Beam pipeline must meet a
specified schema, where certain fields are required to be non-null. It would be
ideal to collect records that do not meet the schema to output them to some
kind of deadletters store.
Our current implementation requires us to use the slower
BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to
parse these TableRows into a custom typed object, outputting any failures to a
side output for downstream processing. This isn't incredibly cumbersome, but it
would be a nice feature of the connector itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)