shekhars-li opened a new pull request #1592: URL: https://github.com/apache/samza/pull/1592
Symptoms: On receiving a retriable failure for blob operations (like 503 - service temporarily not available), the async operations were only retrying 3 times (including the first failed attempts and 2 retries). This is not expected behavior. The async blob operations are expected to be retried up to 10 minutes (with backoff). Cause: On receiving a retriable failure from blob store, the operations are retried with a backoff, as defined by RetryPolicy in executeAsyncWithRetries method [here](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/util/FutureUtil.java#L145). The intention was to limit the retry by max duration (10 minutes). However, the default count of retries in [RetryPolicy is 3,](https://frontbackend.com/java/failsafe-retry-policy) unless it is overridden. Fix: Override max attempts in RetryPolicy for async executions to unlimited (setting it to -1). Test: Added unit test for one of the async blob operations to verify correct behavior of [executeAsyncWithRetries](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/util/FutureUtil.java#L145) method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
