[jira] [Updated] (BEAM-11919) BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows

Jacquelyn Wax (Jira) Wed, 03 Mar 2021 17:31:04 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jacquelyn Wax updated BEAM-11919:
---------------------------------
    Summary: BigQueryIO.read(SerializableFunction): Collect records that could 
not be parsed into the custom-typed object into a PCollection of TableRows  
(was: BigQueryIO.read(SerializableFunction): Collect records that could not be 
successfully parsed into the user-provided custom-typed object into a 
PCollection of TableRows)

> BigQueryIO.read(SerializableFunction): Collect records that could not be 
> parsed into the custom-typed object into a PCollection of TableRows
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-11919
>                 URL: https://issues.apache.org/jira/browse/BEAM-11919
>             Project: Beam
>          Issue Type: Wish
>          Components: io-java-gcp
>            Reporter: Jacquelyn Wax
>            Priority: P3
>
> Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() 
> allows a user to collect failed writes for downstream processing (e.g., 
> sinking the records into some kind of deadletter store), could the results of 
> a BigQueryIO.read(SerializableFunction) be collected, allowing a user to 
> access TableRows that were not able to be parsed by the provided function , 
> for the purpose of downstream processing (e.g., some kind of deadletter 
> handling). 
> In our use case, all data loaded into our Apache Beam pipeline must meet a 
> specified schema, where certain fields are required to be non-null. It would 
> be ideal to collect records that do not meet the schema to output them to 
> some kind of deadletters store.
> Our current implementation requires us to use the slower 
> BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to 
> parse these TableRows into a custom typed object, outputting any failures to 
> a side output for downstream processing. This isn't incredibly cumbersome, 
> but it would be a nice feature of the connector itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-11919) BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows

Reply via email to