[
https://issues.apache.org/jira/browse/BEAM-14383?focusedWorklogId=767432&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-767432
]
ASF GitHub Bot logged work on BEAM-14383:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 06/May/22 22:13
Start Date: 06/May/22 22:13
Worklog Time Spent: 10m
Work Description: pabloem commented on code in PR #17517:
URL: https://github.com/apache/beam/pull/17517#discussion_r867244233
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1780,8 +1781,9 @@ def _flush_batch(self, destination):
return [
pvalue.TaggedOutput(
BigQueryWriteFn.FAILED_ROWS,
- GlobalWindows.windowed_value((destination, row)))
- for row in failed_rows
+ GlobalWindows.windowed_value((destination, row, row_errors)))
Review Comment:
oh I see that you already updated it. Thanks. Makes sense!
Issue Time Tracking
-------------------
Worklog Id: (was: 767432)
Time Spent: 3.5h (was: 3h 20m)
> Improve "FailedRows" errors returned by beam.io.WriteToBigQuery
> ---------------------------------------------------------------
>
> Key: BEAM-14383
> URL: https://issues.apache.org/jira/browse/BEAM-14383
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Oskar Firlej
> Priority: P2
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> `WriteToBigQuery` pipeline returns `errors` when trying to insert rows that
> do not match the BigQuery table schema. `errors` is a dictionary that
> cointains one `FailedRows` key. `FailedRows` is a list of tuples where each
> tuple has two elements: BigQuery table name and the row that didn't match the
> schema.
> This can be verified by running the `BigQueryIO deadletter pattern`
> https://beam.apache.org/documentation/patterns/bigqueryio/
> Using this approach I can print the failed rows in a pipeline. When running
> the job, logger simultaneously prints out the reason why the rows were
> invalid. The reason should also be included in the tuple in addition to the
> BigQuery table and the raw row. This way next pipeline could process both the
> invalid row and the reason why it is invalid.
> During my reasearch i found a couple of alternate solutions, but i think they
> are more complex than they need to be. Thats why i explored the beam source
> code and found the solution to be an easy and simple change.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)