[GitHub] [beam] reuvenlax commented on issue #21713: 404s in BigQueryIO don't get output to Failed Inserts PCollection

GitBox Tue, 13 Sep 2022 22:45:42 -0700


reuvenlax commented on issue #21713:
URL: https://github.com/apache/beam/issues/21713#issuecomment-1246266615


   I suspect it does happen. Some context:
   This feature was intended for known persistent failures at the row level, 
and is implemented using the per-row status returned by BigQuery. An example 
would be a row that does not match the BigQuery schema or a row that exceeds 
BigQuery's size limit. This feature does not capture potentially ephemeral 
failures, such as when the RPC to BigQuery itself fails. In that case we simply 
retry the RPC.
   
   While it seems that the RPC returning a 404 is a persistent error, this 
generally isn't the case. For instance a temporary outage might cause the RPC 
to return 404, yet on retry the RPC succeeds. We didn't want a situation in 
which a temporary BigQuery outage caused all data to be sent to the dead-letter 
output for a period of time.
   
   However I would like to understand the use case more. Is this a case in 
which records destined for a specific table are sent to the Dataflow pipeline 
before the BigQuery table is created? Is there some offline process creating 
those tables, and is that process simply delayed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] reuvenlax commented on issue #21713: 404s in BigQueryIO don't get output to Failed Inserts PCollection

Reply via email to