[
https://issues.apache.org/jira/browse/BEAM-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacquelyn Wax updated BEAM-11919:
---------------------------------
Summary: BigQueryIO.read(SerializableFunction): Collect records that could
not be parsed into the custom-typed object into a PCollection of TableRows
(was: BigQueryIO.read(SerializableFunction): Collect records that could not be
successfully parsed into the user-provided custom-typed object into a
PCollection of TableRows)
> BigQueryIO.read(SerializableFunction): Collect records that could not be
> parsed into the custom-typed object into a PCollection of TableRows
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-11919
> URL: https://issues.apache.org/jira/browse/BEAM-11919
> Project: Beam
> Issue Type: Wish
> Components: io-java-gcp
> Reporter: Jacquelyn Wax
> Priority: P3
>
> Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts()
> allows a user to collect failed writes for downstream processing (e.g.,
> sinking the records into some kind of deadletter store), could the results of
> a BigQueryIO.read(SerializableFunction) be collected, allowing a user to
> access TableRows that were not able to be parsed by the provided function ,
> for the purpose of downstream processing (e.g., some kind of deadletter
> handling).
> In our use case, all data loaded into our Apache Beam pipeline must meet a
> specified schema, where certain fields are required to be non-null. It would
> be ideal to collect records that do not meet the schema to output them to
> some kind of deadletters store.
> Our current implementation requires us to use the slower
> BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to
> parse these TableRows into a custom typed object, outputting any failures to
> a side output for downstream processing. This isn't incredibly cumbersome,
> but it would be a nice feature of the connector itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)