On Fri, May 17, 2013 at 6:05 PM, Bernardo Dal Seno <[email protected]>wrote:

> On 17 May 2013 11:46, Michele Tartara <[email protected]> wrote:
> > Ganeti is currently not able to detect a legit shutdown request
> performed by a
> > user from inside a Xen domain.
> >
> > This patch provides a design document to implement a mechanism able to
> cope with
> > such events.
> >
> > Signed-off-by: Michele Tartara <[email protected]>
> > ---
> >  Makefile.am                      |  1 +
> >  doc/design-draft.rst             |  1 +
> >  doc/design-internal-shutdown.rst | 72
> ++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 74 insertions(+)
> >  create mode 100644 doc/design-internal-shutdown.rst
> >
> > diff --git a/Makefile.am b/Makefile.am
> > index 037cf53..f66624e 100644
> > --- a/Makefile.am
> > +++ b/Makefile.am
> > @@ -410,6 +410,7 @@ docinput = \
> >         doc/design-htools-2.3.rst \
> >         doc/design-http-server.rst \
> >         doc/design-impexp2.rst \
> > +       doc/design-internal-shutdown.rst \
> >         doc/design-lu-generated-jobs.rst \
> >         doc/design-linuxha.rst \
> >         doc/design-multi-reloc.rst \
> > diff --git a/doc/design-draft.rst b/doc/design-draft.rst
> > index ccb2f93..9a1d2b1 100644
> > --- a/doc/design-draft.rst
> > +++ b/doc/design-draft.rst
> > @@ -19,6 +19,7 @@ Design document drafts
> >     design-storagetypes.rst
> >     design-reason-trail.rst
> >     design-device-uuid-name.rst
> > +   design-internal-shutdown.rst
> >
> >  .. vim: set textwidth=72 :
> >  .. Local Variables:
> > diff --git a/doc/design-internal-shutdown.rst
> b/doc/design-internal-shutdown.rst
> > new file mode 100644
> > index 0000000..836d00c
> > --- /dev/null
> > +++ b/doc/design-internal-shutdown.rst
> > @@ -0,0 +1,72 @@
> > +============================================================
> > +Detection of user-initiated shutdown from inside an instance
> > +============================================================
> > +
> > +.. contents:: :depth: 2
> > +
> > +This is a design document detailing the implementation of a way for
> Ganeti to
> > +detect whether a machine marked as up but not running was shutdown
> gracefully
> > +by the user from inside the machine itself.
> > +
> > +Current state and shortcomings
> > +==============================
> > +
> > +Ganeti keeps track of the desired status of instances in order to be
> able to
> > +take proper actions (e.g.: reboot) on the ones that happen to crash.
> > +Currently, the only way to properly shut down a machine is through
> Ganeti's own
> > +commands, that will mark an instance as ``ADMIN_down``.
> > +If a user shuts down an instance from inside, through the proper
> command of the
> > +operating system it is running, the instance will be shutdown
> gracefully, but
> > +Ganeti is not aware of that: the desired status of the instance will
> still be
> > +marked as ``running``, so when the watcher realises that the instance
> is down,
> > +it will restart it. This behaviour is usually not what the user expects.
> > +
> > +Proposed changes
> > +================
> > +
> > +We propose to modify Ganeti in such a way that it will detect when an
> instance
> > +was shutdown because of an explicit user request. When such a situation
> is
> > +detected, the state of the instance will be set to ADMIN_down, as
> intended by
> > +the user.
> > +
> > +This design document applies to the Xen backend of Ganeti, because it
> uses
> > +features specific of such hypervisor.
> > +
> > +Implementation
> > +==============
> > +
> > +Xen knows why a domain is being shut down (a crash or an explicit
> shutdown
> > +or poweroff request), but such information is not usually readily
> available
> > +externally, because all such cases lead to the virtual machine being
> destroyed
> > +immediately after the event is detected.
> > +
> > +Still, Xen allows the instance configuration file to define what action
> to be
> > +taken in all those cases through the ``on_poweroff``, ``on_shutdown``
> and
> > +``on_crash`` variables. By setting them to ``preserve``, Xen will avoid
> > +destroying the domains automatically.
> > +
> > +When the domain is not destroyed, it can be viewed by using ``xm list``
> (or ``xl
> > +list`` in newer Xen versions), and the ``State`` field of the output
> will
> > +provide useful information.
> > +
> > +If the state is ``----c-`` it means the instance has crashed.
> > +
> > +If the state is ``---s--`` it means the instance was properly shutdown.
> > +
> > +If the instance was properly shutdown and it is still marked as
> ``running`` by
> > +Ganeti, it means that it was shutdown from inside by the user, and the
> ganeti
> > +status of the instance needs to be changed to ``ADMIN_down``.
> > +
> > +This will be done at regular intervals by the group watcher, just before
> > +deciding which instances to reboot.
> > +
> > +On top of that, at the same times, the watcher will also need to issue
> ``xm
> > +destroy`` commands for all the domains that are in crashed or shutdown
> state,
> > +since this will not be done automatically by Xen anymore because of the
> > +``preserve`` setting in their config files.
>
> I think that that should be done also by gnt-instance start and
> similar commands, as they could be issued before the watcher runs.
>
> Also, what happens to output of gnt-instance list? Will it be correct?
>
> Read my reply to Guido's emails and you'll find the answer to your
questions. :-)

Thanks for pointing it out, though.
I'll soon send a revised design doc containing those clarifications.

Thanks,
Michele

Reply via email to