On Wed, Jan 20, 2016 at 2:42 PM, Dean Henrichsmeyer <[email protected]> wrote:
> Hi, > > It seems the original point James was making is getting missed. No one is > arguing over the value of being able to retry and/or idempotent hooks. > Yes, you should be able to retry them and yes nothing should break if you > run them over and over. > > The point made is that Juju shouldn't be automatically retrying them. The > argument of "no one knows what went wrong so Juju automatically retrying > them is a better experience" doesn't work. The intelligence of the stack in > question, regardless of what it is, goes in the charms. If you start > conflating and mixing up where the intelligence goes then creating, > running, and debugging those distributed systems will be a nightmare. > Hook errors *will* happen, and often for transient reasons. In handling this, we can choose between "users retry without understanding the details" and "juju retries without understanding the details" [0]. I'd be happy to make the behaviour configurable, for the rare cases when the user *does* understand the details and wants full and detailed control, but I don't think that's the common case. The magic should only be in Juju's ability to effectively drive the models > and intelligence encoded in the charms. It shouldn't make assumptions about > what that intelligence is or what those models require. > Stopping on hook error can only *prevent* those charms from applying their intelligence. No more hooks to be run => no more opportunity to react. If a charm wants to be smart about errors, it needs to detect the errors it *knows* about, and react to those by setting status; and to move on *without* failing the hook, thereby giving subsequent hooks an opportunity to be smart. Ultimately, it comes down to the fact that there's *always* another error case you haven't considered. If you depend on the charmer to implement retries for specific errors, that's essentially a whitelist, and they're stuck playing whack-a-mole forever [1]. But if the charmer can depend on external retries, they only have to worry about maintaining a definitely-fatal blacklist and reporting those conditions in status. Am I making any sense here? Cheers William [0] or "the system stays broken forever", I suppose :). [1] I imagine the rational approach there is to give up, and start whitelisting by operation rather than by error; i.e. to accept that most errors are unknown/transient and should be dumbly retried. And given that, why should every charmer have to roll their own retries?
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
