[jira] [Updated] (SAMZA-174) Refactor retry loops into a common abstraction

Martin Kleppmann (JIRA) Thu, 20 Mar 2014 16:44:27 -0700

     [ 
https://issues.apache.org/jira/browse/SAMZA-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martin Kleppmann updated SAMZA-174:
-----------------------------------

    Attachment: SAMZA-174.3.patch

bq. 1. I think we can remove the InterrutpedException in KafkaSystemConsumer 
(per-comment in RB).

Ok, sounds good. I've removed the surrounding try/catch. Nice to simplify the 
code a bit further.

bq. 2. Did you run hello-samza successfully with this patch applied?

Yeah, I've run it and not noticed any problems or anything unusual in the logs.

Attaching the updated patch, rebased onto master. Thanks for your review!

> Refactor retry loops into a common abstraction
> ----------------------------------------------
>
>                 Key: SAMZA-174
>                 URL: https://issues.apache.org/jira/browse/SAMZA-174
>             Project: Samza
>          Issue Type: Improvement
>          Components: kafka
>    Affects Versions: 0.6.0
>            Reporter: Martin Kleppmann
>            Assignee: Martin Kleppmann
>             Fix For: 0.7.0
>
>         Attachments: SAMZA-174.3.patch, SAMZA-174.patch
>
>
> There are various places in Samza where something needs to be retried until 
> it eventually succeeds. At the moment, those retry loop implementations are a 
> bit inconsistent and ad-hoc. Some of the problems they have:
> * Some use exponential backoff, others just use a fixed delay. As far as I 
> can see, exponential backoff would be appropriate in most places (faster 
> recovery from transient errors, but without hammering a failed service in 
> case of a persistent error)
> * Inconsistent handling of thread interrupts (InterruptedException and 
> friends)
> * Some retry loops do exponential backoff, but don't reset the backoff if the 
> operation succeeds. So if the job has an error, then runs fine for a few 
> hours, then experiences another error, the delay on the second error is 
> longer than it should be.
> * Subtle control flow makes it difficult to tell by looking at the code 
> whether resources are being freed correctly in all cases. Simpler control 
> flow such as try-finally would make it easier to see in code review whether 
> we are leaking resources (e.g. SAMZA-101).
> * Catching Throwable (rather than Exception) may catch more than we should, 
> so e.g. scala.runtime.NonLocalReturnControl (a non-exception Throwable used 
> internally by Scala for flow control) can cause a retry, which makes no sense.
> * Testability is not good. You can't easily stub out Thread.sleep in a unit 
> test, for example.
> A better solution would be to have a single, well-tested implementation of a 
> retry loop, and to make everyone that needs to retry stuff use that 
> implementation. In tests, a mock implementation can be passed in, which 
> avoids real sleeps in tests, and allows a greater variety of scenarios to be 
> tested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SAMZA-174) Refactor retry loops into a common abstraction

Reply via email to