[jira] [Work logged] (BEAM-14383) Improve "FailedRows" errors returned by beam.io.WriteToBigQuery

ASF GitHub Bot (Jira) Mon, 09 May 2022 10:03:06 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-14383?focusedWorklogId=768027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768027
 ]


ASF GitHub Bot logged work on BEAM-14383:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/May/22 17:02
            Start Date: 09/May/22 17:02
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on PR #17517:
URL: https://github.com/apache/beam/pull/17517#issuecomment-1121351370

   my bad. I am adding a fix here: https://github.com/apache/beam/pull/17584
   
   On Mon, May 9, 2022 at 10:01 AM Brian Hulette ***@***.***>
   wrote:
   
   > It looks like we didn't get a green run on the Python PostComit before
   > merging, the new test is failing at HEAD. I filed BEAM-14447
   > <https://issues.apache.org/jira/browse/BEAM-14447> to track the failure.
   > Could you take a look @Firlej <https://github.com/Firlej>?
   >
   > If its not quick to diagnose and fix, we might just rollback this PR to
   > preserve test signal. It's easy enough to roll it forward with a fix once
   > we figure it out.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/17517#issuecomment-1121349920>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAJ5Z3FLUUGJLOMTKMB4ZXLVJFAF7ANCNFSM5UYRZW5A>
   > .
   > You are receiving this because you modified the open/close state.Message
   > ID: ***@***.***>
   >
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 768027)
    Time Spent: 4h 10m  (was: 4h)

> Improve "FailedRows" errors returned by beam.io.WriteToBigQuery
> ---------------------------------------------------------------
>
>                 Key: BEAM-14383
>                 URL: https://issues.apache.org/jira/browse/BEAM-14383
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-gcp
>            Reporter: Oskar Firlej
>            Priority: P2
>             Fix For: 2.39.0
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> `WriteToBigQuery` pipeline returns `errors` when trying to insert rows that 
> do not match the BigQuery table schema. `errors` is a dictionary that 
> cointains one `FailedRows` key. `FailedRows` is a list of tuples where each 
> tuple has two elements: BigQuery table name and the row that didn't match the 
> schema.
> This can be verified by running the `BigQueryIO deadletter pattern` 
> https://beam.apache.org/documentation/patterns/bigqueryio/
> Using this approach I can print the failed rows in a pipeline. When running 
> the job, logger simultaneously prints out the reason why the rows were 
> invalid. The reason should also be included in the tuple in addition to the 
> BigQuery table and the raw row. This way next pipeline could process both the 
> invalid row and the reason why it is invalid.
> During my reasearch i found a couple of alternate solutions, but i think they 
> are more complex than they need to be. Thats why i explored the beam source 
> code and found the solution to be an easy and simple change.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (BEAM-14383) Improve "FailedRows" errors returned by beam.io.WriteToBigQuery

Reply via email to