On Tue, Feb 19, 2013 at 06:18:47AM +0100, Guido Trotter wrote: > On Mon, Feb 18, 2013 at 9:43 AM, Iustin Pop <[email protected]> wrote: > > Hi, > > > On Fri, Feb 15, 2013 at 05:49:55PM +0100, Guido Trotter wrote: > >> - Specify that there will be options for selecting nodes by at least > >> nodegroups and tags, rather than just individually. > >> - Specify a better handling for non-redundant instances (eg. plain or > >> file) which today are simply ignored > >> - Specify that the rolling maintenance behavior is triggered by > >> instances being up, but also overridable > >> - Remove execution of rolling maintenances altogether, as it is deemed > >> unsafe in the current version, and move it to future work, discuss the > >> requirements that were pointed out for it to be safe. > >> > >> Cosmetic: > >> - Fix numbered list, which were rendered incorrectly in the HTML version > >> > >> Signed-off-by: Guido Trotter <[email protected]> > >> --- > >> doc/design-hroller.rst | 98 > >> ++++++++++++++++++++++++++---------------------- > >> 1 file changed, 54 insertions(+), 44 deletions(-) > >> > >> diff --git a/doc/design-hroller.rst b/doc/design-hroller.rst > >> index 632531b..6cedddc 100644 > >> --- a/doc/design-hroller.rst > >> +++ b/doc/design-hroller.rst > >> @@ -26,6 +26,28 @@ reboots). > >> Proposed changes > >> ================ > >> > >> +New options > >> +----------- > >> + > >> +- HRoller should be able to operate on single nodegroups (-G flag) or > >> + select its target node through some other mean (eg. via a tag, or a > >> + regexp). (Note that individual node selection is already possible via > >> + the -O flag, that makes hroller ignore a node altogether). > >> +- HRoller should handle non redundant instances: currently these are > >> + ignored but there should be a way to select its behavior between "it's > >> + ok to reboot a node when a non-redundant instance is on it" > >> + (``--allow-non-redundant-reboots``) or "skip nodes with non-redundant > >> + instances". This will only be selectable globally, and not per > >> + instance. > >> +- The instance status will automatically make hroller create a rolling > >> + maintenance (as described below) or not (the maintenance will be > >> + rolling if any instance is up). It will be possible to override this > > > > This "or not (the maintenance will be rolling if any instance is up)" is > > a bit confusing, as the "not" is opposite to the text in parenthesis. > > What about: > > > > or not (only if all instances are down). > > > I think I will rephrase completely. How about: > > - Hroller will make sure to keep any instance which is up in its > current state, via live migrations, unless explicitely overridden.
Sgtm. > >> + for testing purposes and to force calculation of a non-rolling > >> + maintenance also if some instances are up > >> + (``--ignore-instance-status-up``). Again, this will be only selectable > > > > here⦠> > > >> + globally, and it won't be possible to override the status for each > >> + single instance. > >> + > >> > >> Calculating rolling maintenances > >> -------------------------------- > >> @@ -38,9 +60,14 @@ Down instances > >> ++++++++++++++ > >> > >> If an instance was shutdown when the maintenance started it will be > >> -ignored. This allows avoiding needlessly moving its primary around, > >> -since it won't suffer a downtime anyway. > >> +considered for avoiding contemporary reboot of its primary and secondary > >> +nodes, but will *not* be considered as a target for the node evacuation. > >> +This allows avoiding needlessly moving its primary around, since it > >> +won't suffer a downtime anyway. > >> > >> +Note that a node with non-redundant instances will only ever be > >> +considered good for rolling-reboot if these are down *and* the > >> +``--allow-non-redundant-reboots`` is set. > > > > and here you're using explicit command line options. I think in a design > > document these should not be called as such. > > > > Also, the wording "and the --allow-non-redundant-reboots is set" is the > > first time this options is mentioned, so introducing it with "the > > option" is wrong, IMHO. > > > > Ack, will remove the option names. > > >> > >> DRBD > >> ++++ > >> @@ -56,20 +83,20 @@ them (citation needed). As such we'll implement for > >> now just the > >> In order to do that we can use the following algorithm: > >> > >> 1) Compute node sets that don't contain both the primary and the > >> -secondary for any instance. This can be done already by the current > >> -hroller graph coloring algorithm: nodes are in the same set (color) if > >> -and only if no edge (instance) exists between them (see the > >> -:manpage:`hroller(1)` manpage for more details). > >> + secondary for any instance. This can be done already by the current > >> + hroller graph coloring algorithm: nodes are in the same set (color) > >> + if and only if no edge (instance) exists between them (see the > >> + :manpage:`hroller(1)` manpage for more details). > >> 2) Inside each node set calculate subsets that don't have any secondary > >> -node in common (this can be done by creating a graph of nodes that are > >> -connected if and only if an instance on both has the same secondary > >> -node, and coloring that graph) > >> + node in common (this can be done by creating a graph of nodes that > >> + are connected if and only if an instance on both has the same > >> + secondary node, and coloring that graph) > >> 3) It is then possible to migrate in parallel all nodes in a subset > >> -created at step 2, and then reboot/perform maintenance on them, and > >> -migrate back their original primaries, which allows the computation > >> -above to be reused for each following subset without N+1 failures being > >> -triggered, if none were present before. See below about the actual > >> -execution of the maintenance. > >> + created at step 2, and then reboot/perform maintenance on them, and > >> + migrate back their original primaries, which allows the computation > >> + above to be reused for each following subset without N+1 failures > >> + being triggered, if none were present before. See below about the > >> + actual execution of the maintenance. > >> > >> Non-DRBD > >> ++++++++ > >> @@ -99,45 +126,28 @@ algorithm might be safe. This perhaps would be a good > >> reason to consider > >> managing better RBD pools, if those are implemented on top of nodes > >> storage, rather than on dedicated storage machines. > >> > >> -Executing rolling maintenances > >> ------------------------------- > >> - > >> -Hroller accepts commands to run to do maintenance automatically. These > >> -are going to be run on the machine hroller runs on, and take a node name > >> -as input. They have then to gain access to the target node (via ssh, > >> -restricted commands, or some other means) and perform their duty. > >> - > >> -1) A command (--check-cmd) will be called on all selected online nodes > >> -to check whether a node needs maintenance. Hroller will proceed only on > >> -nodes that respond positively to this invocation. > >> -FIXME: decide about -D > >> -2) Hroller will evacuate the node of all primary instances. > >> -3) A command (--maint-cmd) will be called on a node to do the actual > >> -maintenance operation. It should do any operation needed to perform the > >> -maintenance including triggering the actual reboot. > >> -3) A command (--verify-cmd) will be called to check that the operation > >> -was successful, it has to wait until the target node is back up (and > >> -decide after how long it should give up) and perform the verification. > >> -If it's not successful hroller will stop and not proceed with other > >> -nodes. > >> -4) The master node will be kept last, but will not otherwise be treated > >> -specially. If hroller was running on the master node, care must be > >> -exercised as its maintenance will have interrupted the software itself, > >> -and as such the verification step will not happen. This will not > >> -automatically be taken care of, in the first version. An additional flag > >> -to just skip the master node will be present as well, in case that's > >> -preferred. > >> - > >> - > >> Future work > >> =========== > >> > >> +Hroller should become able to execute rolling maintenances, rather than > >> +just calculate them. For this to succeed properly one of the following > >> +must happen: > >> + > >> +- HRoller handles rolling maintenances that happen at the same time as > >> + unrelated cluster jobs, and thus recalculates the maintenance at each > >> + step > >> +- HRoller can selectively drain the cluster so it's sure that only the > >> + rolling maintenance can be going on > >> + > >> DRBD nodes' ``replace-disks``' functionality should be implemented. Note > >> that when we will support a DRBD version that allows multi-secondary > >> this can be done safely, without losing replication at any time, by > >> adding a temporary secondary and only when the sync is finished dropping > >> the previous one. > >> > >> +Non-redundant (plain or file) instances should have a way to be moved > >> +off as well (via drbd conversion or plain storage live migration). > > > > These can already be moved via gnt-instance move. Why introduce a new > > method? > > > > To do the movement without a reboot. > But it doesn't matter for this design, so I'll mention instance move as well. Ack, then LGTM with these changes. thanks! iustin
