On Fri, May 17, 2013 at 6:05 PM, Bernardo Dal Seno <[email protected]>wrote:
> On 17 May 2013 11:46, Michele Tartara <[email protected]> wrote: > > Ganeti is currently not able to detect a legit shutdown request > performed by a > > user from inside a Xen domain. > > > > This patch provides a design document to implement a mechanism able to > cope with > > such events. > > > > Signed-off-by: Michele Tartara <[email protected]> > > --- > > Makefile.am | 1 + > > doc/design-draft.rst | 1 + > > doc/design-internal-shutdown.rst | 72 > ++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 74 insertions(+) > > create mode 100644 doc/design-internal-shutdown.rst > > > > diff --git a/Makefile.am b/Makefile.am > > index 037cf53..f66624e 100644 > > --- a/Makefile.am > > +++ b/Makefile.am > > @@ -410,6 +410,7 @@ docinput = \ > > doc/design-htools-2.3.rst \ > > doc/design-http-server.rst \ > > doc/design-impexp2.rst \ > > + doc/design-internal-shutdown.rst \ > > doc/design-lu-generated-jobs.rst \ > > doc/design-linuxha.rst \ > > doc/design-multi-reloc.rst \ > > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > > index ccb2f93..9a1d2b1 100644 > > --- a/doc/design-draft.rst > > +++ b/doc/design-draft.rst > > @@ -19,6 +19,7 @@ Design document drafts > > design-storagetypes.rst > > design-reason-trail.rst > > design-device-uuid-name.rst > > + design-internal-shutdown.rst > > > > .. vim: set textwidth=72 : > > .. Local Variables: > > diff --git a/doc/design-internal-shutdown.rst > b/doc/design-internal-shutdown.rst > > new file mode 100644 > > index 0000000..836d00c > > --- /dev/null > > +++ b/doc/design-internal-shutdown.rst > > @@ -0,0 +1,72 @@ > > +============================================================ > > +Detection of user-initiated shutdown from inside an instance > > +============================================================ > > + > > +.. contents:: :depth: 2 > > + > > +This is a design document detailing the implementation of a way for > Ganeti to > > +detect whether a machine marked as up but not running was shutdown > gracefully > > +by the user from inside the machine itself. > > + > > +Current state and shortcomings > > +============================== > > + > > +Ganeti keeps track of the desired status of instances in order to be > able to > > +take proper actions (e.g.: reboot) on the ones that happen to crash. > > +Currently, the only way to properly shut down a machine is through > Ganeti's own > > +commands, that will mark an instance as ``ADMIN_down``. > > +If a user shuts down an instance from inside, through the proper > command of the > > +operating system it is running, the instance will be shutdown > gracefully, but > > +Ganeti is not aware of that: the desired status of the instance will > still be > > +marked as ``running``, so when the watcher realises that the instance > is down, > > +it will restart it. This behaviour is usually not what the user expects. > > + > > +Proposed changes > > +================ > > + > > +We propose to modify Ganeti in such a way that it will detect when an > instance > > +was shutdown because of an explicit user request. When such a situation > is > > +detected, the state of the instance will be set to ADMIN_down, as > intended by > > +the user. > > + > > +This design document applies to the Xen backend of Ganeti, because it > uses > > +features specific of such hypervisor. > > + > > +Implementation > > +============== > > + > > +Xen knows why a domain is being shut down (a crash or an explicit > shutdown > > +or poweroff request), but such information is not usually readily > available > > +externally, because all such cases lead to the virtual machine being > destroyed > > +immediately after the event is detected. > > + > > +Still, Xen allows the instance configuration file to define what action > to be > > +taken in all those cases through the ``on_poweroff``, ``on_shutdown`` > and > > +``on_crash`` variables. By setting them to ``preserve``, Xen will avoid > > +destroying the domains automatically. > > + > > +When the domain is not destroyed, it can be viewed by using ``xm list`` > (or ``xl > > +list`` in newer Xen versions), and the ``State`` field of the output > will > > +provide useful information. > > + > > +If the state is ``----c-`` it means the instance has crashed. > > + > > +If the state is ``---s--`` it means the instance was properly shutdown. > > + > > +If the instance was properly shutdown and it is still marked as > ``running`` by > > +Ganeti, it means that it was shutdown from inside by the user, and the > ganeti > > +status of the instance needs to be changed to ``ADMIN_down``. > > + > > +This will be done at regular intervals by the group watcher, just before > > +deciding which instances to reboot. > > + > > +On top of that, at the same times, the watcher will also need to issue > ``xm > > +destroy`` commands for all the domains that are in crashed or shutdown > state, > > +since this will not be done automatically by Xen anymore because of the > > +``preserve`` setting in their config files. > > I think that that should be done also by gnt-instance start and > similar commands, as they could be issued before the watcher runs. > > Also, what happens to output of gnt-instance list? Will it be correct? > > Read my reply to Guido's emails and you'll find the answer to your questions. :-) Thanks for pointing it out, though. I'll soon send a revised design doc containing those clarifications. Thanks, Michele
