What is the best method for gracefully shutting down LXC containers in a production environment?

By graceful, I mean that apps such as databases get a shutdown signal, so they can save their data to disk, complete any pending network ops, flush buffers, close filehandles, etc. without data loss.

Presently, the script /etc/init.d/lxc that ships for Ubuntu just does an lxc-stop on any container listed in /etc/default/lxc. Since that is like "pulling the power cord", that seems like an irresponsible and dangerous thing to do. It also does not handle LXC containers not listed in /etc/default/lxc. It needs to be fixed.

There is an RPM package for OpenSuse called rclxc at http://download.opensuse.org/repositories/home:/aljex/ which has an init script for LXC. It uses the following technique:

lxcstop () {
    typeset -i PID=0
    lxc-ps -- -C init -o pid |while read CN PID ;do
        [[ $PID -gt 1 ]] || continue
        [[ "${1:-$CN}" = "$CN" ]] || continue
grep -q 'p0::powerfail:/sbin/init 0' ${LXC_SRV}/${CN}/etc/inittab || continue
        kill -PWR $PID
    done
}

It sends a SIGPWR (after kindly checking .../etc/inittab to make sure init will handle it). It uses lxc-ps to find the PID of the init process first.

The Python script posted yesterday has its own technique. It searches /proc/CONTAINER_PIDs/exe for a link to "/sbin/init", and then sends a SIGINT to those. That seems like a reasonable approach, but all of the Ubuntu init scripts are /bin/sh shell scripts, not Python scripts.

There is also an init script at http://lxc.teegra.net/ for Arch Linux, but as the page says, "... this one is quite simplistic and does not invoke *shutdown*/*halt* or *init 0* in the containers. Also, it might hang on waiting for a container to start." Like the Ubuntu script, it just calls lxc-stop, i.e., pulls the power cable on your containers. Not graceful.

Several of the other scripts or tutorials I found are also outdated or incomplete. For example, many still recommend running the container using "screen", from before the lxc-start -d option was available.

    As an alternate approach, what about running:

lxc-attach -n CONTAINER shutdown -h now

Is there any drawback to doing that, instead? The Python script and the OpenSuse init script mentioned above both need root access, but using lxc-attach (instead) would theoretically work once the User Namespaces are fully implemented.

    Other considerations for a production-quality script:

1. A watchdog timeout, so that if a process hangs during shutdown, eventually lxc-stop would get called anyway. (A broken LXC process should not prevent a host O.S. shutdown!) Could a timeout option be added to lxc-wait for this feature?

2. A method that does not require root, the way virsh does not require root to start or stop a VM. (Maybe this needs to wait.)

3. An "official" command name for graceful shutdowns from the host. I propose lxc-shutdown. (There is an unofficial OpenSuse package from rdannert that has a "lxc-shutdown-all" command, but I have not seen the name "lxc-shutdown" used anywhere.)

4. Which signal?  SIGINT?  SIGPWR?  Both?


I am looking to put some development and testing into this. If readers would kindly post their own "best practices", I could create a new lxc-shutdown command and an init script that uses it.


Thank You,
Derek Simkowiak

P.S.> The last major discussion I found about this was from ~two years ago:
http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg00040.html


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

Reply via email to