On Mon, Feb 18, 2013 at 9:43 AM, Iustin Pop <[email protected]> wrote:

Hi,

> On Fri, Feb 15, 2013 at 05:49:55PM +0100, Guido Trotter wrote:
>> - Specify that there will be options for selecting nodes by at least
>>   nodegroups and tags, rather than just individually.
>> - Specify a better handling for non-redundant instances (eg. plain or
>>   file) which today are simply ignored
>> - Specify that the rolling maintenance behavior is triggered by
>>   instances being up, but also overridable
>> - Remove execution of rolling maintenances altogether, as it is deemed
>>   unsafe in the current version, and move it to future work, discuss the
>>   requirements that were pointed out for it to be safe.
>>
>> Cosmetic:
>> - Fix numbered list, which were rendered incorrectly in the HTML version
>>
>> Signed-off-by: Guido Trotter <[email protected]>
>> ---
>>  doc/design-hroller.rst |   98 
>> ++++++++++++++++++++++++++----------------------
>>  1 file changed, 54 insertions(+), 44 deletions(-)
>>
>> diff --git a/doc/design-hroller.rst b/doc/design-hroller.rst
>> index 632531b..6cedddc 100644
>> --- a/doc/design-hroller.rst
>> +++ b/doc/design-hroller.rst
>> @@ -26,6 +26,28 @@ reboots).
>>  Proposed changes
>>  ================
>>
>> +New options
>> +-----------
>> +
>> +- HRoller should be able to operate on single nodegroups (-G flag) or
>> +  select its target node through some other mean (eg. via a tag, or a
>> +  regexp). (Note that individual node selection is already possible via
>> +  the -O flag, that makes hroller ignore a node altogether).
>> +- HRoller should handle non redundant instances: currently these are
>> +  ignored but there should be a way to select its behavior between "it's
>> +  ok to reboot a node when a non-redundant instance is on it"
>> +  (``--allow-non-redundant-reboots``) or "skip nodes with non-redundant
>> +  instances". This will only be selectable globally, and not per
>> +  instance.
>> +- The instance status will automatically make hroller create a rolling
>> +  maintenance (as described below) or not (the maintenance will be
>> +  rolling if any instance is up). It will be possible to override this
>
> This "or not (the maintenance will be rolling if any instance is up)" is
> a bit confusing, as the "not" is opposite to the text in parenthesis.
> What about:
>
>  or not (only if all instances are down).
>
I think I will rephrase completely. How about:

- Hroller will make sure to keep any instance which is up in its
current state, via live migrations, unless explicitely overridden.



> ?
>
>> +  for testing purposes and to force calculation of a non-rolling
>> +  maintenance also if some instances are up
>> +  (``--ignore-instance-status-up``). Again, this will be only selectable
>
> here…
>
>> +  globally, and it won't be possible to override the status for each
>> +  single instance.
>> +
>>
>>  Calculating rolling maintenances
>>  --------------------------------
>> @@ -38,9 +60,14 @@ Down instances
>>  ++++++++++++++
>>
>>  If an instance was shutdown when the maintenance started it will be
>> -ignored. This allows avoiding needlessly moving its primary around,
>> -since it won't suffer a downtime anyway.
>> +considered for avoiding contemporary reboot of its primary and secondary
>> +nodes, but will *not* be considered as a target for the node evacuation.
>> +This allows avoiding needlessly moving its primary around, since it
>> +won't suffer a downtime anyway.
>>
>> +Note that a node with non-redundant instances will only ever be
>> +considered good for rolling-reboot if these are down *and* the
>> +``--allow-non-redundant-reboots`` is set.
>
> and here you're using explicit command line options. I think in a design
> document these should not be called as such.
>
> Also, the wording "and the --allow-non-redundant-reboots is set" is the
> first time this options is mentioned, so introducing it with "the
> option" is wrong, IMHO.
>

Ack, will remove the option names.

>>
>>  DRBD
>>  ++++
>> @@ -56,20 +83,20 @@ them (citation needed). As such we'll implement for now 
>> just the
>>  In order to do that we can use the following algorithm:
>>
>>  1) Compute node sets that don't contain both the primary and the
>> -secondary for any instance. This can be done already by the current
>> -hroller graph coloring algorithm: nodes are in the same set (color) if
>> -and only if no edge (instance) exists between them (see the
>> -:manpage:`hroller(1)` manpage for more details).
>> +   secondary for any instance. This can be done already by the current
>> +   hroller graph coloring algorithm: nodes are in the same set (color)
>> +   if and only if no edge (instance) exists between them (see the
>> +   :manpage:`hroller(1)` manpage for more details).
>>  2) Inside each node set calculate subsets that don't have any secondary
>> -node in common (this can be done by creating a graph of nodes that are
>> -connected if and only if an instance on both has the same secondary
>> -node, and coloring that graph)
>> +   node in common (this can be done by creating a graph of nodes that
>> +   are connected if and only if an instance on both has the same
>> +   secondary node, and coloring that graph)
>>  3) It is then possible to migrate in parallel all nodes in a subset
>> -created at step 2, and then reboot/perform maintenance on them, and
>> -migrate back their original primaries, which allows the computation
>> -above to be reused for each following subset without N+1 failures being
>> -triggered, if none were present before. See below about the actual
>> -execution of the maintenance.
>> +   created at step 2, and then reboot/perform maintenance on them, and
>> +   migrate back their original primaries, which allows the computation
>> +   above to be reused for each following subset without N+1 failures
>> +   being triggered, if none were present before. See below about the
>> +   actual execution of the maintenance.
>>
>>  Non-DRBD
>>  ++++++++
>> @@ -99,45 +126,28 @@ algorithm might be safe. This perhaps would be a good 
>> reason to consider
>>  managing better RBD pools, if those are implemented on top of nodes
>>  storage, rather than on dedicated storage machines.
>>
>> -Executing rolling maintenances
>> -------------------------------
>> -
>> -Hroller accepts commands to run to do maintenance automatically. These
>> -are going to be run on the machine hroller runs on, and take a node name
>> -as input. They have then to gain access to the target node (via ssh,
>> -restricted commands, or some other means) and perform their duty.
>> -
>> -1) A command (--check-cmd) will be called on all selected online nodes
>> -to check whether a node needs maintenance. Hroller will proceed only on
>> -nodes that respond positively to this invocation.
>> -FIXME: decide about -D
>> -2) Hroller will evacuate the node of all primary instances.
>> -3) A command (--maint-cmd) will be called on a node to do the actual
>> -maintenance operation.  It should do any operation needed to perform the
>> -maintenance including triggering the actual reboot.
>> -3) A command (--verify-cmd) will be called to check that the operation
>> -was successful, it has to wait until the target node is back up (and
>> -decide after how long it should give up) and perform the verification.
>> -If it's not successful hroller will stop and not proceed with other
>> -nodes.
>> -4) The master node will be kept last, but will not otherwise be treated
>> -specially. If hroller was running on the master node, care must be
>> -exercised as its maintenance will have interrupted the software itself,
>> -and as such the verification step will not happen. This will not
>> -automatically be taken care of, in the first version. An additional flag
>> -to just skip the master node will be present as well, in case that's
>> -preferred.
>> -
>> -
>>  Future work
>>  ===========
>>
>> +Hroller should become able to execute rolling maintenances, rather than
>> +just calculate them. For this to succeed properly one of the following
>> +must happen:
>> +
>> +- HRoller handles rolling maintenances that happen at the same time as
>> +  unrelated cluster jobs, and thus recalculates the maintenance at each
>> +  step
>> +- HRoller can selectively drain the cluster so it's sure that only the
>> +  rolling maintenance can be going on
>> +
>>  DRBD nodes' ``replace-disks``' functionality should be implemented. Note
>>  that when we will support a DRBD version that allows multi-secondary
>>  this can be done safely, without losing replication at any time, by
>>  adding a temporary secondary and only when the sync is finished dropping
>>  the previous one.
>>
>> +Non-redundant (plain or file) instances should have a way to be moved
>> +off as well (via drbd conversion or plain storage live migration).
>
> These can already be moved via gnt-instance move. Why introduce a new
> method?
>

To do the movement without a reboot.
But it doesn't matter for this design, so I'll mention instance move as well.

Thanks,

Guido

Reply via email to