Added qemu-kvm task as this is only with kvm quests.

** Description changed:

  After upgrading to natty's kernel I noticed that my VMs would sometimes
  become highly unstable, with random guest applications segfaulting and
  crashing in weird ways. This seems to be more pronounced when running
  more than one VM at a time. This does not seem to be a hardware issue--
  the host is a 6 month old laptop and I ran memtest86 for 12 hours with
  18 successful completions and no errors. There is no host instability or
  messages in dmesg that I could see that would indicate a host problem.
  Downgrading to the maverick kernel fixes this problem. I have a script
  that will launch 10 VMs and run some commands:
  
  #!/bin/sh
  count=0
  while /bin/true ; do
-     count=$(( $count + 1 ))
-     echo "RUN $count"
-     vm-stop -f -p sec
-     sleep 3
-     vm-start -s -v -p sec
-     sleep 15
-     vm-cmd -c -r -p sec apt-get update
-     vm-cmd -c -r -p sec apt-get -y --force-yes dist-upgrade
+     count=$(( $count + 1 ))
+     echo "RUN $count"
+     vm-stop -f -p sec
+     sleep 3
+     vm-start -s -v -p sec
+     sleep 15
+     vm-cmd -c -r -p sec apt-get update
+     vm-cmd -c -r -p sec apt-get -y --force-yes dist-upgrade
  
-     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
-     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
+     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
+     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
  
-     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
-     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
+     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
+     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
  
-     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
-     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
+     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
+     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
  
-     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
-     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
+     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
+     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
  
-     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
-     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
+     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser
+     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge 
chromium-browser*
  done
  
  'vm-start' starts 10 VMs via libvirt with snapshotted qcow2 disks, and
  vm-stop kills them off, discarding the snapshot. 'vm-cmd' will ssh into
  each machine and run the command for each machine in sequence. The VMs
  themselves are all pristine and are resnapshotted on each loop
  iteration. The point of this explanation is to illustrate that while the
  VMs all start in the same state, they fail differently or sometimes not
  at all. I am able to reproduce guest instability within 4-5 iterations
  of this script on a natty kernel. With the maverick kernel it ran for 18
  times with no errors (around 8 hours).
  
  For example, with the above, I saw a maverick/i386 guest fail once with:
  dpkg: parse error, in file '/var/lib/dpkg/status' near line 5914 package 
'libtelepathy-glib0':
-  'Depends' field, reference to 'libglib2.0-0': error in version: version 
string is empty
+  'Depends' field, reference to 'libglib2.0-0': error in version: version 
string is empty
  
  Another time the maverick/i386 failed with:
  Processing triggers for man-db ...
  dpkg: error processing man-db (--unpack):
-  subprocess installed post-installation script killed by signal (Segmentation 
fault)
+  subprocess installed post-installation script killed by signal (Segmentation 
fault)
  Errors were encountered while processing:
-  man-db
+  man-db
  
  A lucid/i386 guest failed another time with:
  Processing triggers for python-gmenu ...
  Rebuilding /usr/share/applications/desktop.en_US.utf8.cache...
  Segmentation fault
  dpkg: error processing python-gmenu (--purge):
-  subprocess installed post-installation script returned error exit status 139
+  subprocess installed post-installation script returned error exit status 139
  Processing triggers for man-db ...
  Errors were encountered while processing:
-  python-gmenu
+  python-gmenu
  
  There are many other failures....
  
  On my laptop I have an i7 with two cores and 4 hyperthreads per core
  (this is the default configuration for this machine from the factory and
  the configuration used to report this bug). I am able to 'disable'
  hyperthreads in the BIOS, and if I do, I end up with 2 cores and 2
  threads per core. In this configuration, I noticed that I don't have to
  run as many VMs to see the problem. I've seen it with as little as 2 VMs
  at a time. I mention this as it seems that the issue is exacerbated when
  the ratio of VMs to CPUs is 1:1 or higher.
  
  I can say for certain that the rc6 and rc7 kernel in natty exhibit the
  problem, and maverick's does not. I can also say that the natty kernel
  runs considerably hotter than the maverick kernel, with average
  temperatures being 10-15C higher underload according to
  /proc/acpi/ibm/thermal (I had to buy a 'chill mat' (a laptop mat with 2
  fans) after upgrading to natty). My gut feeling is that it has to do
  with KSM or virtio, but I don't know that for sure. This may by chipset
  specific as a colleague was unable to reproduce this on the rc6 kernel
  (but also did not run my script-- only did the chromium-browser updates
  over and over again). I thought it could be the high temperatures
  causing problems, but then why wasn't the host having problems? I
  thought it could be the host ram, but memtest86 was ok and running the
  script for 8 hours on maverick fills the ram and swap and there were
- still no problems in the guests and no problems on the host.
+ still no problems in the guests and no problems on the host. Based on
+ the above it seems clear to me that something in the natty kernel is
+ causing the problem.
  
  ProblemType: Bug
  DistroRelease: Ubuntu 11.04
  Package: linux-image-2.6.37-11-generic 2.6.37-11.25
  Regression: Yes
  Reproducible: Yes
  ProcVersionSignature: Ubuntu 2.6.37-11.25-generic 2.6.37-rc7
  Uname: Linux 2.6.37-11-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
  Architecture: amd64
  ArecordDevices:
-  **** List of CAPTURE Hardware Devices ****
-  card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog]
-    Subdevices: 1/1
-    Subdevice #0: subdevice #0
+  **** List of CAPTURE Hardware Devices ****
+  card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog]
+    Subdevices: 1/1
+    Subdevice #0: subdevice #0
  AudioDevicesInUse:
-  USER        PID ACCESS COMMAND
-  /dev/snd/controlC0:  jamie      2360 F.... pulseaudio
-  /dev/snd/pcmC0D0p:   jamie      2360 F...m pulseaudio
+  USER        PID ACCESS COMMAND
+  /dev/snd/controlC0:  jamie      2360 F.... pulseaudio
+  /dev/snd/pcmC0D0p:   jamie      2360 F...m pulseaudio
  CRDA: Error: [Errno 2] No such file or directory
  Card0.Amixer.info:
-  Card hw:0 'Intel'/'HDA Intel at 0xf2520000 irq 43'
-    Mixer name : 'Intel IbexPeak HDMI'
-    Components : 'HDA:14f15069,17aa2156,00100302 
HDA:80862804,17aa21b5,00100000'
-    Controls      : 16
-    Simple ctrls  : 7
+  Card hw:0 'Intel'/'HDA Intel at 0xf2520000 irq 43'
+    Mixer name : 'Intel IbexPeak HDMI'
+    Components : 'HDA:14f15069,17aa2156,00100302 
HDA:80862804,17aa21b5,00100000'
+    Controls      : 16
+    Simple ctrls  : 7
  Card29.Amixer.info:
-  Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 
6QHT28WW-1.09'
-    Mixer name : 'ThinkPad EC 6QHT28WW-1.09'
-    Components : ''
-    Controls      : 1
-    Simple ctrls  : 1
+  Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 
6QHT28WW-1.09'
+    Mixer name : 'ThinkPad EC 6QHT28WW-1.09'
+    Components : ''
+    Controls      : 1
+    Simple ctrls  : 1
  Card29.Amixer.values:
-  Simple mixer control 'Console',0
-    Capabilities: pswitch pswitch-joined penum
-    Playback channels: Mono
-    Mono: Playback [off]
+  Simple mixer control 'Console',0
+    Capabilities: pswitch pswitch-joined penum
+    Playback channels: Mono
+    Mono: Playback [off]
  Date: Thu Dec 23 23:22:11 2010
  EcryptfsInUse: Yes
  HibernationDevice: RESUME=UUID=58280e6e-d161-43ea-8593-a89fb7b6851a
  InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
  MachineType: LENOVO 5129CTO
  ProcEnviron:
-  LANGUAGE=en_US:en
-  PATH=(custom, user)
-  LANG=en_US.UTF-8
-  LC_MESSAGES=en_US.utf8
-  SHELL=/bin/bash
+  LANGUAGE=en_US:en
+  PATH=(custom, user)
+  LANG=en_US.UTF-8
+  LC_MESSAGES=en_US.utf8
+  SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic 
root=UUID=82571cfb-fdda-4d2f-b708-f8924aa0fe21 ro vt.handoff=7 quiet splash
  RelatedPackageVersions: linux-firmware 1.44
  SourcePackage: linux
  dmi.bios.date: 04/20/2010
  dmi.bios.vendor: LENOVO
  dmi.bios.version: 6QET44WW (1.14 )
  dmi.board.name: 5129CTO
  dmi.board.vendor: LENOVO
  dmi.board.version: Not Available
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: Not Available
  dmi.modalias: 
dmi:bvnLENOVO:bvr6QET44WW(1.14):bd04/20/2010:svnLENOVO:pn5129CTO:pvrThinkPadX201s:rvnLENOVO:rn5129CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
  dmi.product.name: 5129CTO
  dmi.product.version: ThinkPad X201s
  dmi.sys.vendor: LENOVO

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu-kvm in ubuntu.
https://bugs.launchpad.net/bugs/694029

Title:
  [natty] kvm guests become unstable after a while

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to