damccorm opened a new issue, #20514:
URL: https://github.com/apache/beam/issues/20514
A user may call `apache_beam.io.gcp.bigquery.WriteToBigQuery` to write their
streamed data to BQ. If any rows fail to write, this will return a tagged
pcollection `BigQueryWriteFn.FAILED_ROWS`. This data includes a tuple
`(destination_table, failed_row_payload)`.
My suggestion is to include the error information in the `FAILED_ROWS`
pcollection. From the source code we can see that we have access to the error
information, e.g. that the row failed because field `id` was `invalid` because
`this field is not a record`. I think we should surface this to the user.
I'm happy to open a PR for this myself (as I've already had to overwrite the
original code in several projects), but it looks like we'd need a breaking
change by either extending the tuple which would cause unpacking issues in
existing code, or by returning a different data structure entirely.
Relevant owners:
[~altay]
[[email protected]]
Imported from Jira
[BEAM-10233](https://issues.apache.org/jira/browse/BEAM-10233). Original Jira
may contain additional context.
Reported by: tomhardman0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]