On 12/6/2010 2:42 AM, Trent W. Buck wrote:
This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
containers. The goal here is that a "shutdown -h now" of the dom0
should not result in a potentially inconsistent domU postgres database,
cf. a naive lxc-stop.
As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
has halted by 1) seeing a reboot event in<container>/var/run/utmp; or
2) seeing<container>'s PID 1 terminate.
Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
into mountall's (upstart's) /lib/init/fstab. Without it, the most
immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
thinks lo (at least) is already configured, and the boot process hangs
waiting for the network.
Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
tmpfs. The shipped lxc-ubuntu script works around this by deleting the
ifstate file and not mounting a tmpfs on /var/run, but to me that is
simply waiting for something else to assume /var/run is empty. It also
doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
More or less by accident, I discovered that I can tell lxc-start that
the container is ready to halt by "crashing" upstart:
container# kill -SEGV 1
Likewise I can spoof a ctrl-alt-delete event in the container with:
dom0# pkill -INT lxc-start
I automate the former signalling at the end of shutdowns thusly:
chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
chroot $template_dir tee>/dev/null /sbin/reboot<<-EOF
#!/bin/bash
while getopts nwdfiph opt
do [[ f = \$opt ]]&& exec kill -SEGV 1
done
exec -a "$0" "\$0.distrib" "\$@"
EOF
chroot $template_dir chmod +x /sbin/reboot
chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
I use the latter in my customized /etc/init.d/lxc stop rule.
Note that the lxc-wait's SHOULD be parallelized, but this is not
possible as at lxc 0.7.2 :-(
Sure it is.
I parallelize the shutdowns (in any version, including 0.7.2) by doing
all the lxc-stop in parallel without looking or waiting, then in a
separate following step do a loop that waits for no containers running.
Here is my openSUSE init.d/lxc:
https://build.opensuse.org/package/files?package=lxc&project=home:aljex
And the packages:
http://download.opensuse.org/repositories/home:/aljex/*/lxc-0.7.2*.rpm
It makes assumptions that are wrong for ubuntu and is more limited than
you may want in terms of what it even tries to handle. But that's beside
the point of parallel shutdowns.
* cgroup handling includes a particular stack of override logic for
possible cgroup mount points that makes sense to me.
- start with built-in default /var/run/lxc/cgroup, and name it "lxc" so
as not to conflict with any other cgroup setup by default.
- if you defined something in $LXC_CONF, prefer it over default
- if kernel is providing /sys/fs/cgroup automatically, prefer that over
either default or $LXC_CONF
- if a cgroup named "lxc" is already mounted, prefer that over all else
* assumes lxc 0.7.2 because the script is part of a lxc-0.7.2 rpm
- removes the shutdown/reboot watchdog functions that were needed in
0.6.5 but are built in to 0.7.2 now.
* only starts containers that are defined by $LXC_ETC/*/config
* only shuts down containers that it started
* the stop function greps for /sbin/init in container inittab instead of
trying to allow for any random container pid #1
* no provision for application/service containers, just whole systems
started with /sbin/init
* starts containers in screen
- I have not figured out what it would take to get nice behavior out of
lxc-console yet and screen is both easy and standard.
The $LXC_CONF (/etc/lxc/lxc.conf) referenced at the top does not exist
usually so everything that happens is visible right in the script.
I'm using this in production. So far so good.
typical usage:
nj10:~ # rclxc status
Checking for LXC containers...
running
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is RUNNING
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # rclxc stop vps008
Shutting down LXC containers...
done
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is STOPPED
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # rclxc status
Checking for LXC containers...
running
nj10:~ # rclxc stop
Shutting down LXC containers...
done
nj10:~ # rclxc status
Checking for LXC containers...
unused
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is STOPPED
'vps002' is STOPPED
'vps003' is STOPPED
'vps004' is STOPPED
'vps005' is STOPPED
'vps006' is STOPPED
'vps007' is STOPPED
'vps008' is STOPPED
'vps009' is STOPPED
'vps011' is STOPPED
'vps012' is STOPPED
'vps013' is STOPPED
nj10:~ # time rclxc start
Starting LXC containers...
done
real 0m0.242s
user 0m0.012s
sys 0m0.000s
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is RUNNING
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # screen -r vps013
INIT: version 2.88 booting
INIT: Entering runlevel: 3
blogd: can not set console device to /dev/pts/34: Device or resource busy
Master Resource Control: previous runlevel: N, switching to runlevel:3
Initializing random number generator done
Starting syslog services done
Starting D-Bus daemon done
No keyboard map to load
Loading compose table winkeys shiftctrl latin1.add done
Stop Unicode mode done
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8
lo done
eth0
eth0 IP address: 71.187.206.90/24
eth0 done
Setting up service (localfs) network . . . . . . . . . . done
Starting SSH daemon done
Loading CPUFreq modules (CPUFreq not supported)
Starting HAL daemon done
Setting up (remotefs) network interfaces:
Setting up service (remotefs) network . . . . . . . . . . done
Re-Starting syslog services done
Starting auditd The audit system is disabled
done
Starting incron done
Starting mail service (Postfix) done
Starting CRON daemon done
Starting rpcbind done
Starting rsync daemon done
Starting smartd unused
Starting vsftpd done
Starting INET services. (xinetd) done
Master Resource Control: runlevel 3 has been reached
Skipped services in runlevel 3: splash smartd
Welcome to openSUSE 11.3 "Teal" - Kernel 2.6.37-rc3-3-default (console).
nj10-013 login:
[detached]
nj10:~ # time rclxc stop
Shutting down LXC containers...
done
real 0m8.537s
user 0m0.048s
sys 0m0.124s
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is STOPPED
'vps002' is STOPPED
'vps003' is STOPPED
'vps004' is STOPPED
'vps005' is STOPPED
'vps006' is STOPPED
'vps007' is STOPPED
'vps008' is STOPPED
'vps009' is STOPPED
'vps011' is STOPPED
'vps012' is STOPPED
'vps013' is STOPPED
nj10:~ # screen -ls
No Sockets found in /var/run/screens/S-root.
nj10:~ # lxc-ps --lxc auxwww
CONTAINER USER PID %CPU %MEM VSZ RSS TTY STAT START
TIME COMMAND
nj10:~ #
--
bkw
#!/bin/sh
# /etc/init.d/lxc
# and its symbolic link
# /usr/sbin/rclxc
#
# System startup script for LXC containers.
# For lxc 0.7.2 which doesn't require an external monitor process to perform
# the lxc-stop when a containers init process requests init 0|1|6 .
#
# 20101108 - Brian K. White - br...@aljex.com
#
### BEGIN INIT INFO
# Provides: lxc
# Required-Start: $ALL
# Should-Start:
# Required-Stop: $ALL
# Should-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Short-Description: LXC Linux Containers
# Description: Start/Stop LXC containers.
### END INIT INFO
. /etc/rc.status
LXC_ETC=/etc/lxc
LXC_SRV=/srv/lxc
CGROUP_MOUNT_POINT=/var/run/lxc/cgroup
CGROUP_MOUNT_NAME=lxc
CGROUP_MOUNTED=false
CGROUP_RELEASE_AGENT="/usr/sbin/lxc_cgroup_release_agent"
LXC_CONF=${LXC_ETC}/lxc.conf
[[ -s $LXC_CONF ]] && . $LXC_CONF
# Various possible overrides to cgroup mount point.
# If kernel supplies cgroup mount point, prefer it.
[[ -d /sys/fs/cgroup ]] && CGROUP_MOUNT_POINT=/sys/fs/cgroup
CGROUP_MOUNT_NAME=cgroup
# If cgroup already mounted, use it no matter where it is.
# If multiple cgroup mounts, prefer the one named lxc if any.
eval `awk 'BEGIN{P="";N=""}END{print("cgmp="P"
cgmn="N)}($3=="cgroup"){N=$1;P=$2;if($1="lxc")exit}' /proc/mounts`
[[ "$cgmn" && "$cgmp" && -d "$cgmp" ]] && CGROUP_MOUNT_POINT=$cgmp
CGROUP_MOUNT_NAME=$cgmn CGROUP_MOUNTED=true
lxcstrt () {
$CGROUP_MOUNTED || {
[[ -d $CGROUP_MOUNT_POINT ]] || mkdir -p $CGROUP_MOUNT_POINT
mount -t cgroup $CGROUP_MOUNT_NAME $CGROUP_MOUNT_POINT
}
echo "$CGROUP_RELEASE_AGENT" >${CGROUP_MOUNT_POINT}/release_agent
echo 1 >${CGROUP_MOUNT_POINT}/notify_on_release
cd $LXC_ETC
for CF in */config ; do
CN=${CF%/*}
[[ "${1:-$CN}" = "$CN" ]] || continue
screen -dmS $CN lxc-start -f $CF -n $CN
done
}
lxcstop () {
typeset -i PID=0
lxc-ps -C init -opid |while read CN PID ;do
[[ $PID -gt 1 ]] || continue
[[ "${1:-$CN}" = "$CN" ]] || continue
grep -q 'p0::powerfail:/sbin/init 0'
${LXC_SRV}/${CN}/etc/inittab || continue
kill -PWR $PID
done
}
lxcstat () {
typeset -i R=0
cd $LXC_ETC
for CF in */config ; do
CN="${CF%/*}"
[[ "${1:-$CN}" = "$CN" ]] || continue
S=`lxc-info -n $CN`
echo "$S"
[[ "${S##* }" = "RUNNING" ]] && ((R++))
done
[[ $R -gt 0 ]] && return 0 || return 3
}
rc_reset
case "$1" in
start)
echo -n "Starting LXC containers..."
lxcstrt $2
rc_status -v
;;
stop)
echo -n "Shutting down LXC containers..."
lxcstop $2
while $0 status $2 >/dev/null 2>&1 ; do sleep 2 ; done
rc_status -v
;;
try-restart)
$0 status && $0 restart || rc_reset
rc_status
;;
restart)
$0 stop $2
$0 start $2
rc_status
;;
status)
echo -n "Checking for LXC containers..."
lxcstat $2 >/dev/null 2>&1
rc_status -v
;;
info|list|show)
echo "Listing LXC containers..."
lxcstat $2
;;
*)
echo "Usage: $0 {start|stop|try-restart|restart|status|list}
[container_name]"
exit 1
;;
esac
rc_exit
------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users