[GitHub] [beam] damccorm opened a new issue, #20514: Return error message/information with existing FAILED_ROW data from BigQueryWriteFn (Python SDK)

GitBox Sat, 04 Jun 2022 11:03:32 -0700


damccorm opened a new issue, #20514:
URL: https://github.com/apache/beam/issues/20514


   A user may call `apache_beam.io.gcp.bigquery.WriteToBigQuery` to write their 
streamed data to BQ. If any rows fail to write, this will return a tagged 
pcollection `BigQueryWriteFn.FAILED_ROWS`. This data includes a tuple 
`(destination_table, failed_row_payload)`.
   
   My suggestion is to include the error information in the `FAILED_ROWS` 
pcollection. From the source code we can see that we have access to the error 
information, e.g. that the row failed because field `id` was `invalid` because 
`this field is not a record`. I think we should surface this to the user.
   
   I'm happy to open a PR for this myself (as I've already had to overwrite the 
original code in several projects), but it looks like we'd need a breaking 
change by either extending the tuple which would cause unpacking issues in 
existing code, or by returning a different data structure entirely.
   
    
   
   Relevant owners:
   
   [~altay] 
    [[email protected]]
   
   Imported from Jira 
[BEAM-10233](https://issues.apache.org/jira/browse/BEAM-10233). Original Jira 
may contain additional context.
   Reported by: tomhardman0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #20514: Return error message/information with existing FAILED_ROW data from BigQueryWriteFn (Python SDK)

Reply via email to