Martin Kleppmann created SAMZA-174:
--------------------------------------

             Summary: Refactor retry loops into a common abstraction
                 Key: SAMZA-174
                 URL: https://issues.apache.org/jira/browse/SAMZA-174
             Project: Samza
          Issue Type: Improvement
          Components: kafka
    Affects Versions: 0.6.0
            Reporter: Martin Kleppmann
            Assignee: Martin Kleppmann
             Fix For: 0.7.0


There are various places in Samza where something needs to be retried until it 
eventually succeeds. At the moment, those retry loop implementations are a bit 
inconsistent and ad-hoc. Some of the problems they have:

* Some use exponential backoff, others just use a fixed delay. As far as I can 
see, exponential backoff would be appropriate in most places (faster recovery 
from transient errors, but without hammering a failed service in case of a 
persistent error)
* Inconsistent handling of thread interrupts (InterruptedException and friends)
* Some retry loops do exponential backoff, but don't reset the backoff if the 
operation succeeds. So if the job has an error, then runs fine for a few hours, 
then experiences another error, the delay on the second error is longer than it 
should be.
* Subtle control flow makes it difficult to tell by looking at the code whether 
resources are being freed correctly in all cases. Simpler control flow such as 
try-finally would make it easier to see in code review whether we are leaking 
resources (e.g. SAMZA-101).
* Catching Throwable (rather than Exception) may catch more than we should, so 
e.g. scala.runtime.NonLocalReturnControl (a non-exception Throwable used 
internally by Scala for flow control) can cause a retry, which makes no sense.
* Testability is not good. You can't easily stub out Thread.sleep in a unit 
test, for example.

A better solution would be to have a single, well-tested implementation of a 
retry loop, and to make everyone that needs to retry stuff use that 
implementation. In tests, a mock implementation can be passed in, which avoids 
real sleeps in tests, and allows a greater variety of scenarios to be tested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to