On Tue, Jan 19, 2016 at 3:14 PM, James Page <[email protected]> wrote: > > I think this is a dangerous behaviour to introduce to Juju; a hook error > should be a signal to an end user that something really bad happened, and > that they need to dig in further (preferably with points from status > messages); if the function that a hook is performing is re-tryable, that > needs to be handled in charm and not by Juju IMHO. >
There are a few problems with this. 0) The function that a hook is performing *must* be retryable anyway. Hooks need to be idempotent; we guarantee at-least-once execution, not at-most-once. 1) As a user, what a hook error means in practice is "retry the hook" (good thing all those hooks are idempotent...). Most users aren't in a position to debug their charm if it goes wrong, so their only actual interaction is basically a thoughtless pavlovian response, the absence of which can leave an environment needlessly hosed until a human notices it. May as well automate it for better UX *and* happier outcomes. 2) In any given hook, the ratio of known errors to possible errors is approximately 0:1 [0]. Those infinitesimally few known errors should indeed set statuses before failing out (even if you have to look in status history to see them); but we have to be mindful of the vast majority of cases, where we have *no idea* what could have gone wrong. And in that case, the only functional response is to retry -- some unknown errors may be fatal, but to *assume* they are risks locking up the system on every transient blip. 3) Finally, now that you have the choice, I'd advise against in-hook retries: (i) the longer you sit in one hook retrying, the longer all colocated units are blocked [1]; and (ii) delegating the retries to the infrastructure lets you write much much cleaner code [2]. Are there any concerns that I've missed? Specifically I was testing some changes to the odl-controller charm; this > feature covered up a race in the charm hook code accessing the API of ODL, > which I failed to notice the first few times I deployed (not paying > attention due to multi-tasking), and then had me scratching my head as to > what was going on when I started to notice the hook failure. > You say "covered up a race", I say "automatically resolved the problem for you" :-). Cheers William [0] this applies to any code really, inside or outside juju, it's not specific to hooks at all. [1] and while it may not be *common* I'm pretty sure it'd be *possible* for a hook to deadlock like this; would prefer not to encourage that. [2] this is also widely applicable: adding retry logic *within* an idempotent operation is basically always worse than building independent operation-retrying infrastructure and reusing that where necessary.
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
