[ 
https://issues.apache.org/jira/browse/SAMZA-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann resolved SAMZA-174.
------------------------------------

    Resolution: Fixed

I've committed this. Resolving.

> Refactor retry loops into a common abstraction
> ----------------------------------------------
>
>                 Key: SAMZA-174
>                 URL: https://issues.apache.org/jira/browse/SAMZA-174
>             Project: Samza
>          Issue Type: Improvement
>          Components: kafka
>    Affects Versions: 0.6.0
>            Reporter: Martin Kleppmann
>            Assignee: Martin Kleppmann
>             Fix For: 0.7.0
>
>         Attachments: SAMZA-174.3.patch, SAMZA-174.patch
>
>
> There are various places in Samza where something needs to be retried until 
> it eventually succeeds. At the moment, those retry loop implementations are a 
> bit inconsistent and ad-hoc. Some of the problems they have:
> * Some use exponential backoff, others just use a fixed delay. As far as I 
> can see, exponential backoff would be appropriate in most places (faster 
> recovery from transient errors, but without hammering a failed service in 
> case of a persistent error)
> * Inconsistent handling of thread interrupts (InterruptedException and 
> friends)
> * Some retry loops do exponential backoff, but don't reset the backoff if the 
> operation succeeds. So if the job has an error, then runs fine for a few 
> hours, then experiences another error, the delay on the second error is 
> longer than it should be.
> * Subtle control flow makes it difficult to tell by looking at the code 
> whether resources are being freed correctly in all cases. Simpler control 
> flow such as try-finally would make it easier to see in code review whether 
> we are leaking resources (e.g. SAMZA-101).
> * Catching Throwable (rather than Exception) may catch more than we should, 
> so e.g. scala.runtime.NonLocalReturnControl (a non-exception Throwable used 
> internally by Scala for flow control) can cause a retry, which makes no sense.
> * Testability is not good. You can't easily stub out Thread.sleep in a unit 
> test, for example.
> A better solution would be to have a single, well-tested implementation of a 
> retry loop, and to make everyone that needs to retry stuff use that 
> implementation. In tests, a mock implementation can be passed in, which 
> avoids real sleeps in tests, and allows a greater variety of scenarios to be 
> tested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to