On 26 January 2015 at 20:54, Mario Splivalo <[email protected]> wrote: > Hello! > > Currently juju provides relation-departed hook, which will fire on all > units that are part of relation, and relation-broken hook, which will > fire on unit that just departed the relation. > > The problem arises when we have a multi-unit service peered. Consider > MongoDB charm where we usually have replicaset formed with three or more > units: > When a unit is destroyed (with 'juju remove-unit') first relation-broken > hook will fire between the departing unit and all the 'staying' units. > Then, on the departed unit relation-broken hook is fired. But, if we > need to do some work on the departing unit before it leaves the > relation, there is no way to do so. When 'relation-departed' hook is > called there is no way of telling (if we make observation from within > the hook) if we are running on unit that is departing, or on unit that > is 'staying' within the relation. > > A '-before-departed' hook would, I think, solve. First a > '-before-departed' hook will be fired on the departing unit. Then > '-departed' hook will fire against departing and staying units. And, > lastly, as it is now, the -broken hook will fire. > > Ignoring the, most likely, wrong nomenclature of the proposed hook, what > are your opinions on the matter?
I've been working on similar issues. When the peer relation-departed hook is fired, the unit running it knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not be alive - we may be removing a failed unit from the service. $REMOTE_UNIT may be alive but uncontactable - some form of network partition has occurred. When the peer relation-broken hook is fired, the unit running it knows that is it leaving the cluster and decomissions itself. However, this hook may never be run if the unit has failed. Or it may be impossible to complete successfully (eg. corrupted filesystem). I agree that this is not rich enough to remove units robustly. The peer relation-departed hooks are not particularly useful to me, as they cannot know in advance if the relation-broken hook will complete successfully. It is the peer relation-broken hook that is responsible for properly decoupling the unit from the service, and this works fine if the unit is healthy. The problem is of course if the departing unit *has* failed, because no subsequent hooks are called to repair the damaged cluster. As a concrete example, to remove a cassandra node from a cluster: - First, run 'nodetool decommission' on the departing node. This streams its partitions to the remaining nodes. - Second, if 'nodetool decommission' failed or could not be run, run 'nodetool removenode' on one of the other nodes. This removed the failed node from the ring, and the remaining nodes will rebalance and rebuild using redundant copies of the data. Data may be lost if stored with a replication factor of 1 or if updates only waited for an acknowledgement from 1 node. An extra hook as you suggest would help me to solve this issue. But what would also solve my issue is juju leadership (currently in development). When the lead unit runs its peer relation-departed hook, it connects to the departing unit and runs the decommissioning process on its behalf. If it is unable to connect, it assumes the node is failed and cleans up. It can even notify the remaining non-leader units that the remove unit has been removed from the cluster, giving them a chance to update their configuration if necessary. You can't really do this without the leadership feature, as you can't coordinate which of the remaining units is responsible for decommissioning the departing unit (and they would trip over each other if they all attempted to decommission the departing node). The edge case in my approach is of course if the departing unit is live, but for some reason the leader cannot connect to it. Maybe your inter DC links have gone down. However, there are similar issues with the extra hook. If your -before-departed hook fails to run, how long should juju wait until it gives up and triggers the -departed hooks? Perhaps what is needed here is instead an extra hook run on the remaining units if the -broken hook could not be run successfully? Lets call it relation-failed. It could be fired when we know the vm is gone and the -broken hook was not successfully run. -- Stuart Bishop <[email protected]> -- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
