shekhars-li opened a new pull request #1592:
URL: https://github.com/apache/samza/pull/1592


   Symptoms: On receiving a retriable failure for blob operations (like 503 - 
service temporarily not available), the async operations were only retrying 3 
times (including the first failed attempts and 2 retries). This is not expected 
behavior. The async blob operations are expected to be retried up to 10 minutes 
(with backoff).
   
   Cause: On receiving a retriable failure from blob store, the operations are 
retried with a backoff, as defined by RetryPolicy in executeAsyncWithRetries 
method 
[here](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/util/FutureUtil.java#L145).
 The intention was to limit the retry by max duration (10 minutes). However, 
the default count of retries in [RetryPolicy is 
3,](https://frontbackend.com/java/failsafe-retry-policy) unless it is 
overridden. 
   
   Fix: Override max attempts in RetryPolicy for async executions to unlimited 
(setting it to -1). 
   
   Test: Added unit test for one of the async blob operations to verify correct 
behavior of 
[executeAsyncWithRetries](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/util/FutureUtil.java#L145)
 method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to