Let's talk retries

Katherine Cox-Buday Mon, 08 Aug 2016 17:22:51 -0700

Hey All,

We currently have 3 ways we're performing retries in Juju:


1. Archaic, custom, intra-package retry code.
2. github.com/juju/utils::{Countdown,BackoffTimer}
3. github.com/juju/retry (current best practice)

I have just touched some code which fits #1, and when I was attempting to 
retrofit it to #3, I found it a little obtuse. Here's why:

In my case (and I assume most), I wanted something like this:

for range retries { // Loop should break if our parent is cancelled
        // Do something
    // Success, retry, or fatal error
}

To implement this, I do something like this:

args := retry.CallArgs{
    Func: func() error {
        // Body of my loop
    },
    BackoffFunc: func(delay time.Duration, attempt int) time.Duration {
        if attempt == 1 {
            return delay
        }
        return delay * factor
    },
    Stop: dying,
    // etc...
}

Call(args)

We attempt to encapsulate every scenario for retries in 
github.com/juju/retry/::CallArgs. We have variables to determine if errors are 
fatal, to notify things, how to back off, etc. This feels very heavy-weight to 
me, and a bit like the monolith model rather than the unix model. I would 
prefer the logic be left to the caller (i.e. do I break out of this loop? do I 
notify something?), and the interface into a retry strategy be much simpler, 
say a channel like in time.Tick(time.Duration) <-chan time.Time.

How would folks feel about something like this?

func BackoffTick(done <-chan struct{}, clock clock.Cock, delay time.Duration, 
factor int64) <-chan time.Time {
        signalStream := make(chan time.Time)
        go func() {
                defer close(signalStream)
                for {
                        select {
                        case <-done:
                                return
                        case signalStream <- time.After(delay):
                                delay = time.Duration(delay.Nanoseconds() * 
factor)
                        }
                }
        }()
        return signalStream
}

With this, the above becomes:

for range BackoffTick(dying, myClock, 1*time.Second, 2) {
    // Determination of fatality and notifications happen here
}

If you want a max retry concept, you do this:

for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
    // Determination of fatality and notifications happen here
}

If you want a max duration you do this:

done = Timeout(done, 5*time.Minute)
for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) {
    // Determination of fatality and notifications happen here
}

Functionally, what's in juju/retry is fine; and I can stuff anything I want 
into the function. It just feels a little odd to use in that I must put the 
body of my loop in a function, and I dislike that the CallArgs struct attempts 
to encapsulate a bunch of different scenarios.

Thoughts?

Also, is it on the roadmap to address the inconsitant retries throughout Juju? 
I almost used utils.BackoffTimer until I started looking deeper. I'm going to 
submit a PR to flag it as deprecated.

-- 
Katherine

-- 
Juju-dev mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Let's talk retries

Reply via email to