[jira] [Commented] (BEAM-14364) 404s in BigQueryIO don't get output to Failed Inserts PCollection

Bruno Volpato da Cunha (Jira) Fri, 29 Apr 2022 12:52:05 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530212#comment-17530212
 ]


Bruno Volpato da Cunha commented on BEAM-14364:
-----------------------------------------------

> If the request itself fails, we just retry. And for streaming, we-retry 
> failed requests forever till corrected. So probably this is what you are 
> running into and I think it's the expected behavior. We might have to better 
> document this.

I believe not having a way to handle/unblock streaming without recreating the 
table (assuming that the table points to something that you can actually 
create) can cause major issues. Draining a job doesn't work too, so you may 
have to forcibly cancel and may incur data loss.

In my opinion, this could be something configured on the InsertRetryPolicy 
itself, and issued to Failed Inserts PCollection if it still doesn't go through.

> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
>                 Key: BEAM-14364
>                 URL: https://issues.apache.org/jira/browse/BEAM-14364
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Svetak Vihaan Sundhar
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P1
>         Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream 
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed 
> Inserts (via getFailedInserts().
>  
> Almost all of the time, the table exists before hand, but given that new 
> tables can be created, we want this behavior to be non-explosive to the Job, 
> however, what we are seeing is that processing completely stops in those 
> pipelines, and eventually the jobs run out of memory. I feel that the 
> appropriate action when BigQuery 404's for the table, would be to submit 
> those failed TableRows to the output PCollection and continue processing as 
> normal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (BEAM-14364) 404s in BigQueryIO don't get output to Failed Inserts PCollection

Reply via email to