Re: [PATCH master] HRoller design updates

Iustin Pop Tue, 19 Feb 2013 00:45:34 -0800

On Tue, Feb 19, 2013 at 06:18:47AM +0100, Guido Trotter wrote:
> On Mon, Feb 18, 2013 at 9:43 AM, Iustin Pop <[email protected]> wrote:
> 
> Hi,
> 
> > On Fri, Feb 15, 2013 at 05:49:55PM +0100, Guido Trotter wrote:
> >> - Specify that there will be options for selecting nodes by at least
> >>   nodegroups and tags, rather than just individually.
> >> - Specify a better handling for non-redundant instances (eg. plain or
> >>   file) which today are simply ignored
> >> - Specify that the rolling maintenance behavior is triggered by
> >>   instances being up, but also overridable
> >> - Remove execution of rolling maintenances altogether, as it is deemed
> >>   unsafe in the current version, and move it to future work, discuss the
> >>   requirements that were pointed out for it to be safe.
> >>
> >> Cosmetic:
> >> - Fix numbered list, which were rendered incorrectly in the HTML version
> >>
> >> Signed-off-by: Guido Trotter <[email protected]>
> >> ---
> >>  doc/design-hroller.rst |   98 
> >> ++++++++++++++++++++++++++----------------------
> >>  1 file changed, 54 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/doc/design-hroller.rst b/doc/design-hroller.rst
> >> index 632531b..6cedddc 100644
> >> --- a/doc/design-hroller.rst
> >> +++ b/doc/design-hroller.rst
> >> @@ -26,6 +26,28 @@ reboots).
> >>  Proposed changes
> >>  ================
> >>
> >> +New options
> >> +-----------
> >> +
> >> +- HRoller should be able to operate on single nodegroups (-G flag) or
> >> +  select its target node through some other mean (eg. via a tag, or a
> >> +  regexp). (Note that individual node selection is already possible via
> >> +  the -O flag, that makes hroller ignore a node altogether).
> >> +- HRoller should handle non redundant instances: currently these are
> >> +  ignored but there should be a way to select its behavior between "it's
> >> +  ok to reboot a node when a non-redundant instance is on it"
> >> +  (``--allow-non-redundant-reboots``) or "skip nodes with non-redundant
> >> +  instances". This will only be selectable globally, and not per
> >> +  instance.
> >> +- The instance status will automatically make hroller create a rolling
> >> +  maintenance (as described below) or not (the maintenance will be
> >> +  rolling if any instance is up). It will be possible to override this
> >
> > This "or not (the maintenance will be rolling if any instance is up)" is
> > a bit confusing, as the "not" is opposite to the text in parenthesis.
> > What about:
> >
> >  or not (only if all instances are down).
> >
> I think I will rephrase completely. How about:
> 
> - Hroller will make sure to keep any instance which is up in its
> current state, via live migrations, unless explicitely overridden.


Sgtm.

> >> +  for testing purposes and to force calculation of a non-rolling
> >> +  maintenance also if some instances are up
> >> +  (``--ignore-instance-status-up``). Again, this will be only selectable
> >
> > here…
> >
> >> +  globally, and it won't be possible to override the status for each
> >> +  single instance.
> >> +
> >>
> >>  Calculating rolling maintenances
> >>  --------------------------------
> >> @@ -38,9 +60,14 @@ Down instances
> >>  ++++++++++++++
> >>
> >>  If an instance was shutdown when the maintenance started it will be
> >> -ignored. This allows avoiding needlessly moving its primary around,
> >> -since it won't suffer a downtime anyway.
> >> +considered for avoiding contemporary reboot of its primary and secondary
> >> +nodes, but will *not* be considered as a target for the node evacuation.
> >> +This allows avoiding needlessly moving its primary around, since it
> >> +won't suffer a downtime anyway.
> >>
> >> +Note that a node with non-redundant instances will only ever be
> >> +considered good for rolling-reboot if these are down *and* the
> >> +``--allow-non-redundant-reboots`` is set.
> >
> > and here you're using explicit command line options. I think in a design
> > document these should not be called as such.
> >
> > Also, the wording "and the --allow-non-redundant-reboots is set" is the
> > first time this options is mentioned, so introducing it with "the
> > option" is wrong, IMHO.
> >
> 
> Ack, will remove the option names.
> 
> >>
> >>  DRBD
> >>  ++++
> >> @@ -56,20 +83,20 @@ them (citation needed). As such we'll implement for 
> >> now just the
> >>  In order to do that we can use the following algorithm:
> >>
> >>  1) Compute node sets that don't contain both the primary and the
> >> -secondary for any instance. This can be done already by the current
> >> -hroller graph coloring algorithm: nodes are in the same set (color) if
> >> -and only if no edge (instance) exists between them (see the
> >> -:manpage:`hroller(1)` manpage for more details).
> >> +   secondary for any instance. This can be done already by the current
> >> +   hroller graph coloring algorithm: nodes are in the same set (color)
> >> +   if and only if no edge (instance) exists between them (see the
> >> +   :manpage:`hroller(1)` manpage for more details).
> >>  2) Inside each node set calculate subsets that don't have any secondary
> >> -node in common (this can be done by creating a graph of nodes that are
> >> -connected if and only if an instance on both has the same secondary
> >> -node, and coloring that graph)
> >> +   node in common (this can be done by creating a graph of nodes that
> >> +   are connected if and only if an instance on both has the same
> >> +   secondary node, and coloring that graph)
> >>  3) It is then possible to migrate in parallel all nodes in a subset
> >> -created at step 2, and then reboot/perform maintenance on them, and
> >> -migrate back their original primaries, which allows the computation
> >> -above to be reused for each following subset without N+1 failures being
> >> -triggered, if none were present before. See below about the actual
> >> -execution of the maintenance.
> >> +   created at step 2, and then reboot/perform maintenance on them, and
> >> +   migrate back their original primaries, which allows the computation
> >> +   above to be reused for each following subset without N+1 failures
> >> +   being triggered, if none were present before. See below about the
> >> +   actual execution of the maintenance.
> >>
> >>  Non-DRBD
> >>  ++++++++
> >> @@ -99,45 +126,28 @@ algorithm might be safe. This perhaps would be a good 
> >> reason to consider
> >>  managing better RBD pools, if those are implemented on top of nodes
> >>  storage, rather than on dedicated storage machines.
> >>
> >> -Executing rolling maintenances
> >> -------------------------------
> >> -
> >> -Hroller accepts commands to run to do maintenance automatically. These
> >> -are going to be run on the machine hroller runs on, and take a node name
> >> -as input. They have then to gain access to the target node (via ssh,
> >> -restricted commands, or some other means) and perform their duty.
> >> -
> >> -1) A command (--check-cmd) will be called on all selected online nodes
> >> -to check whether a node needs maintenance. Hroller will proceed only on
> >> -nodes that respond positively to this invocation.
> >> -FIXME: decide about -D
> >> -2) Hroller will evacuate the node of all primary instances.
> >> -3) A command (--maint-cmd) will be called on a node to do the actual
> >> -maintenance operation.  It should do any operation needed to perform the
> >> -maintenance including triggering the actual reboot.
> >> -3) A command (--verify-cmd) will be called to check that the operation
> >> -was successful, it has to wait until the target node is back up (and
> >> -decide after how long it should give up) and perform the verification.
> >> -If it's not successful hroller will stop and not proceed with other
> >> -nodes.
> >> -4) The master node will be kept last, but will not otherwise be treated
> >> -specially. If hroller was running on the master node, care must be
> >> -exercised as its maintenance will have interrupted the software itself,
> >> -and as such the verification step will not happen. This will not
> >> -automatically be taken care of, in the first version. An additional flag
> >> -to just skip the master node will be present as well, in case that's
> >> -preferred.
> >> -
> >> -
> >>  Future work
> >>  ===========
> >>
> >> +Hroller should become able to execute rolling maintenances, rather than
> >> +just calculate them. For this to succeed properly one of the following
> >> +must happen:
> >> +
> >> +- HRoller handles rolling maintenances that happen at the same time as
> >> +  unrelated cluster jobs, and thus recalculates the maintenance at each
> >> +  step
> >> +- HRoller can selectively drain the cluster so it's sure that only the
> >> +  rolling maintenance can be going on
> >> +
> >>  DRBD nodes' ``replace-disks``' functionality should be implemented. Note
> >>  that when we will support a DRBD version that allows multi-secondary
> >>  this can be done safely, without losing replication at any time, by
> >>  adding a temporary secondary and only when the sync is finished dropping
> >>  the previous one.
> >>
> >> +Non-redundant (plain or file) instances should have a way to be moved
> >> +off as well (via drbd conversion or plain storage live migration).
> >
> > These can already be moved via gnt-instance move. Why introduce a new
> > method?
> >
> 
> To do the movement without a reboot.
> But it doesn't matter for this design, so I'll mention instance move as well.

Ack, then LGTM with these changes.

thanks!
iustin

Re: [PATCH master] HRoller design updates

Reply via email to