ng-oliver opened a new issue, #35978: URL: https://github.com/apache/beam/issues/35978
Currently in `WriteToBigQuery(PTransform)`, the argument `[retry_strategy](https://github.com/apache/beam/blob/4c9799388c0386920fa2c058c5b66b8a9b0505bd/sdks/python/apache_beam/io/gcp/bigquery.py#L1444)` is only effective under `method = streaming_inserts`. It would be very helpful if the argument is also effective under `method = storage_write_api` to enable users more control on retry mechanism. I come across an issue today that - my production database had a row of data that was 26MB - when dataflow read the 26MB row and attempted to write into BigQuery warehouse, it reached the BigQuery 10MB row size limit under Storage Write API - even though i configured no retry in my dataflow job, because the argument was not taken into consideration under `method = storage_write_api`, dataflow was retrying indefinitely until i drained the job and restarted a new one **Why is it problem** - Because retry will be made indefinitely, all other error handling mechanisms downstream such as writing to dead letter tables will not be executed - My only option was to configure in the software side to ensure the API call will be made in chunks of reasonable size such that when the row of data lands into production, it will be smaller than a size of 10MB To be fair, the chance of having a row of data > 10MB in production is low, but it is still very valuable to enable users to effectively handle errors under `method = storage_write_api` regardless the type of error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
