[
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529747#comment-17529747
]
Darren Norton commented on BEAM-14364:
--------------------------------------
Hi [~chamikara], I was the engineer who reported this issue and was in
communication with Svetak. Part of my investigation was to try changing the
insertRetryPolicy from
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
to
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
and the behavior continued where no messages would be returned in the output
PCollection, and the job would eventually kill the VMs.
> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
> Key: BEAM-14364
> URL: https://issues.apache.org/jira/browse/BEAM-14364
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Svetak Vihaan Sundhar
> Assignee: Svetak Vihaan Sundhar
> Priority: P1
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed
> Inserts (via getFailedInserts().
>
> Almost all of the time, the table exists before hand, but given that new
> tables can be created, we want this behavior to be non-explosive to the Job,
> however, what we are seeing is that processing completely stops in those
> pipelines, and eventually the jobs run out of memory. I feel that the
> appropriate action when BigQuery 404's for the table, would be to submit
> those failed TableRows to the output PCollection and continue processing as
> normal.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)