[PR] Increase retry backoff for Storage API batch [beam]

via GitHub Wed, 10 Jul 2024 17:26:30 -0700


ahmedabu98 opened a new pull request, #31837:
URL: https://github.com/apache/beam/pull/31837


   After fixing concurrent connections issue (#31710), the only blocker to 
making Storage API batch scalable is managing AppendRows throughput quota. The 
Storage API backend sets up this quota by having a short-term (cell) quota and 
a long-term (region) quota:
   - short-term quota can take up to 10s to refill
   - long-term quota is an aggregate of multiple cells and can take up to 10min 
to refill
   
   It's important to note that all append operations are rejected while a quota 
is being refilled. 
   
   The standard throughput quota is not sufficient for large writes. Large 
pipeline will typically exhaust the long-term quota quickly, leading to 
consistent failures for 10 min. With enough failures (10 fails per bundle, 4 
failed bundles per Dataflow pipeline), the pipeline eventually gives up and 
fails.
   
   ### To deal with this, we can increase the retry backoff so that pipelines 
can survive long enough until the throughput quota is refilled.
   
   
   ## Disclaimer:
   
   Before this change, in the worst case where all append operations fail, each 
bundle will retry for:
   - 13 seconds for non-quota errors
   - 66 seconds for quota errors
   
   with 4 bundle failures, this total wait time goes up to 52s (non-quota 
errors) and 4.4min (quota errors) before pipeline failure.
   
   ----------------
   With this change, the worst-case wait time goes up to:
   - 113 seconds (1.9 min) for non-quota errors 
   - 340 seconds (5.7 min) for quota errors
   
   A Dataflow pipeline will fail after 7.5min (non-quota errors) and 22.5min 
(quota errors)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Increase retry backoff for Storage API batch [beam]

Reply via email to