[ 
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530206#comment-17530206
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14364:
------------------------------------------------------

Happy to look at a PR to improve the performance when you have one. But we have 
to make sure that this works for all cases and does not change the behavior of 
the connector in a backwards incompatible way.

Regarding the immediate issue, I think we currently send failed records to the 
Failed Inserts PCollection if the write request to BQ is successful but some 
records failed to be inserted to BQ. If the request itself fails, we just 
retry. And for streaming, we-retry failed requests forever till corrected. So 
probably this is what you are running into and I think it's the expected 
behavior. We might have to better document this.

cc: [~reuvenlax] [~pabloem]

> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
>                 Key: BEAM-14364
>                 URL: https://issues.apache.org/jira/browse/BEAM-14364
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Svetak Vihaan Sundhar
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P1
>         Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream 
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed 
> Inserts (via getFailedInserts().
>  
> Almost all of the time, the table exists before hand, but given that new 
> tables can be created, we want this behavior to be non-explosive to the Job, 
> however, what we are seeing is that processing completely stops in those 
> pipelines, and eventually the jobs run out of memory. I feel that the 
> appropriate action when BigQuery 404's for the table, would be to submit 
> those failed TableRows to the output PCollection and continue processing as 
> normal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to