Ryan Harper has proposed merging 
~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master.

Commit message:
vmtest: trigger guest panic to fail fast

A number of vmtest scenarios trigger stuck or hung kernels and leave
the VM running in such state which continues to consume resources on
the host, prolonging the total time for a vmtest complete run.  This
patch reconfigures the guest kernel to panic on soft-lockups, NMI
watchdog misses, and hung tasks and configures QEMU to exit when a
reboot occurs.

The combination will ensure that when a guest cannot progress we fail
fast and exit.  The default install timeout is 3000 seconds.  In the
case of a failure we now will immediate exit, recording failure, and
move on to the next test instead of burning resources for the
remaining portion of the timeout.  This will dramatically reduce the
total amount of time to complete a run and we typically see an install
failure in the first 300 or so.

For test-cases which have provent to be a challenge, we can optionally
enable the 'crashdump' flag in a VMTest class which will modify a VM
to enable the linux kernel crashdump feature and if such a panic
occurs, then lkcd would trigger a dump and capture more debugging
state.  This is disabled by default;  there are some bugs in
configuring/enabling crashdump "live" in the ephemeral enviroment so
we'll only turn this on for hard-to-debug crashes




Requested reviews:
  curtin developers (curtin-dev)

For more details, see:
https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/383805
-- 
Your team curtin developers is requested to review the proposed merge of 
~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master.
diff --git a/examples/tests/crashdump.cfg b/examples/tests/crashdump.cfg
new file mode 100644
index 0000000..e010961
--- /dev/null
+++ b/examples/tests/crashdump.cfg
@@ -0,0 +1,19 @@
+_install_crashdump:
+ - &install_crashdump |
+   command -v apt &>/dev/null && {
+       DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-image-generic
+       debconf-set-selections <<< "kexec-tools  kexec-tools/load_kexec  boolean true"
+       debconf-set-selections <<< "kdump-tools  kdump-tools/use_kdname  boolean true"
+       DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-crashdump;
+       mkdir -p /var/lib/kdump
+       # fix up crashdump post-inst to just put all of the modules in
+       sed -i -e 's,MODULES=dep,MODULES=most,' /etc/kernel/postinst.d/kdump-tools
+       kdump-config load
+       kdump-config show
+    }
+    exit 0
+
+
+early_commands:
+  # run before other install commands
+  0000_aaaa_install_crashdump: ['bash', '-c', *install_crashdump]
diff --git a/tests/vmtests/__init__.py b/tests/vmtests/__init__.py
index 222adcc..e102b6d 100644
--- a/tests/vmtests/__init__.py
+++ b/tests/vmtests/__init__.py
@@ -601,6 +601,7 @@ class VMBaseClass(TestCase):
     arch_skip = []
     boot_timeout = BOOT_TIMEOUT
     collect_scripts = []
+    crashdump = False
     extra_collect_scripts = []
     conf_file = "examples/tests/basic.yaml"
     nr_cpus = None
@@ -967,6 +968,25 @@ class VMBaseClass(TestCase):
                     for service in ["systemd.mask=snapd.seeded.service",
                                     "systemd.mask=snapd.service"]])
 
+        # We set guest kernel panic=1 to trigger immediate rebooot, combined
+        # with the (xkvm) -no-reboot qemu parameter should prevent vmtests from
+        # wasting time in a soft-lockup loop. Add the params after the '---'
+        # separator to extend the parameters to the target system as well.
+        cmd.extend(["--no-reboot", "--append=panic=-1",
+                    "--append=softlockup_panic=1",
+                    "--append=hung_task_panic=1",
+                    "--append=nmi_watchdog=panic,1"])
+
+        # configure guest with crashdump to capture kernel failures for debug
+        if cls.crashdump:
+            # we need to install a kernel and modules so bump the memory by 2g
+            # for the ephemeral environment to hold it all
+            cls.mem = int(cls.mem) + 2048
+            logger.info(
+                'Enabling linux-crashdump during install, mem += 2048 = %s',
+                cls.mem)
+            cmd.extend(["--append=crashkernel=384M-5000M:192M"])
+
         # getting resolvconf configured is only fixed in bionic
         # the iscsi_auto handles resolvconf setup via call to
         # configure_networking in initramfs
@@ -1353,7 +1373,7 @@ class VMBaseClass(TestCase):
         target_disks.extend([output_disk])
 
         # create xkvm cmd
-        cmd = (["tools/xkvm", "-v", dowait] +
+        cmd = (["tools/xkvm", "-v", dowait, '--no-reboot'] +
                uefi_flags + netdevs +
                cls.mpath_diskargs(target_disks + extra_disks + nvme_disks) +
                ["--disk=file=%s,if=virtio,media=cdrom" % cls.td.seed_disk] +
@@ -2111,6 +2131,7 @@ def check_install_log(install_log, nrchars=200):
     # regexps expected in curtin output
     install_pass = INSTALL_PASS_MSG
     install_fail = "({})".format("|".join([
+                   'INFO:.* blocked for more than.*seconds.',
                    'Installation failed',
                    'ImportError: No module named.*',
                    'Out of memory:',
diff --git a/tools/launch b/tools/launch
index db18c80..b49dd76 100755
--- a/tools/launch
+++ b/tools/launch
@@ -50,6 +50,7 @@ Usage: ${0##*/} [ options ] curtin install [args]
            --serial-log F  : log to F (default 'serial.log')
            --root-arg X pass 'X' through as the root= param when booting a
                         kernel.  default: $DEFAULT_ROOT_PARAM
+           --no-reboot  Pass '-no-reboot' through to QEMU
       -v | --verbose    be more verbose
            --no-install-deps  do not install insert '--install-deps'
                               on curtin command invocations
@@ -408,7 +409,7 @@ get_img_fmt() {
 
 main() {
     local short_opts="a:A:d:h:i:k:n:p:s:v"
-    long_opts="add:,append:,arch:,bios:,boot-image:,disk:,dowait,help,initrd:,kernel:,mem:,netdev:,no-dowait,no-proxy-config,power:,publish:,root-arg:,silent,serial-log:,smp:,uefi-nvram:,verbose,vnc:"
+    long_opts="add:,append:,arch:,bios:,boot-image:,disk:,dowait,help,initrd:,kernel:,mem:,netdev:,no-dowait,no-proxy-config,no-reboot,power:,publish:,root-arg:,silent,serial-log:,smp:,uefi-nvram:,verbose,vnc:"
     local getopt_out=""
     getopt_out=$(getopt --name "${0##*/}" \
         --options "${short_opts}" --long "${long_opts}" -- "$@") &&
@@ -461,6 +462,7 @@ main() {
                --no-dowait) pt[${#pt[@]}]="$cur"; dowait=false;;
                --no-install-deps) install_deps="";;
                --no-proxy-config) proxy_config=false;;
+               --no-reboot) pt[${#pt[@]}]="--no-reboot";;
                --power)
                 case "$next" in
                     off) pstate="poweroff";;
diff --git a/tools/xkvm b/tools/xkvm
index 4bb4343..02b9f62 100755
--- a/tools/xkvm
+++ b/tools/xkvm
@@ -339,7 +339,7 @@ get_bios_opts() {
 
 main() {
     local short_opts="hd:n:v"
-    local long_opts="bios:,help,dowait,disk:,dry-run,kvm:,no-dowait,netdev:,uefi,uefi-nvram:,verbose"
+    local long_opts="bios:,help,dowait,disk:,dry-run,kvm:,no-dowait,no-reboot,netdev:,uefi,uefi-nvram:,verbose"
     local getopt_out=""
     getopt_out=$(getopt --name "${0##*/}" \
         --options "${short_opts}" --long "${long_opts}" -- "$@") &&
@@ -371,6 +371,7 @@ main() {
     #  We default to dowait=false if input and output are a terminal
     local dowait=""
     [ -t 0 -a -t 1 ] && dowait=false || dowait=true
+    local noreboot=false
     while [ $# -ne 0 ]; do
         cur=${1}; next=${2};
         case "$cur" in
@@ -384,6 +385,7 @@ main() {
             -v|--verbose) VERBOSITY=$((${VERBOSITY}+1));;
             --dowait) dowait=true;;
             --no-dowait) dowait=false;;
+            --no-reboot) noreboot=true;;
             --bios) bios="$next"; shift;;
             --uefi) uefi=true;;
             --uefi-nvram) uefi=true; uefi_nvram="$next"; shift;;
@@ -683,6 +685,10 @@ main() {
     local rng_devices
     rng_devices=( -object "rng-random,filename=/dev/urandom,id=objrng0"
                   -device "$virtio_rng_device,rng=objrng0,id=rng0" )
+    local reboot_arg
+    if $noreboot; then
+        kvmcmd=( "${kvmcmd[@]}" -no-reboot )
+    fi
     cmd=( "${kvmcmd[@]}" "${archopts[@]}"
           "${bios_opts[@]}"
           "${bus_devices[@]}"
-- 
Mailing list: https://launchpad.net/~curtin-dev
Post to     : curtin-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~curtin-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to