Heya,
This is on Intel Haswell.
First, some version info:
L0, L1 -- both of them have same versions of kernel, qemu:
=====
$ rpm -q kernel --changelog | head -2
* Thu May 09 2013 Josh Boyer - 3.10.0-0.rc0.git23.1
- Linux v3.9-11789-ge0fd9af
=====
=====
$ uname -r ; rpm -q qemu-kvm libvirt-daemon-kvm libguestfs
3.10.0-0.rc0.git23.1.fc20.x86_64
qemu-kvm-1.4.1-1.fc19.x86_64
libvirt-daemon-kvm-1.0.5-2.fc19.x86_64
libguestfs-1.21.35-1.fc19.x86_64
=====
Additionally, neither nmi_watchdog, nor hpet enabled on L0 & L1 kernels:
=====
$ egrep -i 'nmi|hpet' /etc/grub2.cfg
$
=====
KVM parameters on L0 :
=====
$ cat /sys/module/kvm_intel/parameters/nested
Y
$ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
Y
$ cat /sys/module/kvm_intel/parameters/enable_apicv
N
$ cat /sys/module/kvm_intel/parameters/ept
Y
=====
-> That's the stack trace I'm seeing, when I start the L2 guest:
------------------------------------------------
.......
[ 2.162235] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(0,0)
[ 2.163080] Pid: 1, comm: swapper/0 Not tainted 3.8.11-200.fc18.x86_64 #1
[ 2.163080] Call Trace:
[ 2.163080] [<ffffffff81649c19>] panic+0xc1/0x1d0
[ 2.163080] [<ffffffff81d010e0>] mount_block_root+0x1fa/0x2ac
[ 2.163080] [<ffffffff81d011e9>] mount_root+0x57/0x5b
[ 2.163080] [<ffffffff81d0132a>] prepare_namespace+0x13d/0x176
[ 2.163080] [<ffffffff81d00e1c>] kernel_init_freeable+0x1cf/0x1da
[ 2.163080] [<ffffffff81d00610>] ? do_early_param+0x8c/0x8c
[ 2.163080] [<ffffffff81637ca0>] ? rest_init+0x80/0x80
[ 2.163080] [<ffffffff81637cae>] kernel_init+0xe/0xf0
[ 2.163080] [<ffffffff8165bd6c>] ret_from_fork+0x7c/0xb0
[ 2.163080] [<ffffffff81637ca0>] ? rest_init+0x80/0x80
[ 2.163080] Uhhuh. NMI received for unknown reason 30 on CPU 0.
[ 2.163080] Do you have a strange power saving mode enabled?
[ 2.163080] Dazed and confused, but trying to continue
[ 2.163080] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[ 2.163080] Do you have a strange power saving mode enabled?
[ 2.163080] Dazed and confused, but trying to continue
[ 2.163080] Uhhuh. NMI received for unknown reason 30 on CPU 0.
------------------------------------------------
I'm able to reproduce to reproduce this consistently.
L1 QEMU command-line:
====================
$ ps -ef | grep -i qemu
qemu 4962 1 21 15:41 ? 00:00:41
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name regular-guest -S
-machine pc-i440fx-1.4,accel=kvm,usb=off -cpu Haswell,+vmx -m 6144
-smp 4,sockets=4,cores=1,threads=1 -uuid
4ed9ac0b-7f72-dfcf-68b3-e6fe2ac588b2 -nographic -no-user-config
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive
file=/home/test/vmimages/regular-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:c1:34,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
L2 QEMU command-line:
====================
$ qemu 2042 1 0 May09 ? 00:05:03
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name nested-guest -S
-machine pc-i440fx-1.4,accel=kvm,usb=off -m 2048 -smp
2,sockets=2,cores=1,threads=1 -uuid
02ea8988-1054-b08b-bafe-cfbe9659976c -nographic -no-user-config
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/nested-guest.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive
file=/home/test/vmimages/nested-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:65:c4:e6,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
I attached the vmxcap script output.
Before I debug further, anyone has hints here?
Many thanks in advance.
[1] Notes -- https://github.com/kashyapc/nested-virt-notes-intel-f18
/kashyap
Basic VMX Information
Revision 18
VMCS size 1024
VMCS restricted to 32 bit addresses no
Dual-monitor support yes
VMCS memory type 6
INS/OUTS instruction information yes
IA32_VMX_TRUE_*_CTLS support yes
pin-based controls
External interrupt exiting yes
NMI exiting yes
Virtual NMIs yes
Activate VMX-preemption timer yes
Process posted interrupts no
primary processor-based controls
Interrupt window exiting yes
Use TSC offsetting yes
HLT exiting yes
INVLPG exiting yes
MWAIT exiting yes
RDPMC exiting yes
RDTSC exiting yes
CR3-load exiting default
CR3-store exiting default
CR8-load exiting yes
CR8-store exiting yes
Use TPR shadow yes
NMI-window exiting yes
MOV-DR exiting yes
Unconditional I/O exiting yes
Use I/O bitmaps yes
Monitor trap flag yes
Use MSR bitmaps yes
MONITOR exiting yes
PAUSE exiting yes
Activate secondary control yes
secondary processor-based controls
Virtualize APIC accesses yes
Enable EPT yes
Descriptor-table exiting yes
Enable RDTSCP yes
Virtualize x2APIC mode yes
Enable VPID yes
WBINVD exiting yes
Unrestricted guest yes
APIC register emulation no
Virtual interrupt delivery no
PAUSE-loop exiting yes
RDRAND exiting yes
Enable INVPCID yes
Enable VM functions yes
VMCS shadowing yes
EPT-violation #VE no
VM-Exit controls
Save debug controls default
Host address-space size yes
Load IA32_PERF_GLOBAL_CTRL yes
Acknowledge interrupt on exit yes
Save IA32_PAT yes
Load IA32_PAT yes
Save IA32_EFER yes
Load IA32_EFER yes
Save VMX-preemption timer value yes
VM-Entry controls
Load debug controls default
IA-64 mode guest yes
Entry to SMM yes
Deactivate dual-monitor treatment yes
Load IA32_PERF_GLOBAL_CTRL yes
Load IA32_PAT yes
Load IA32_EFER yes
Miscellaneous data
VMX-preemption timer scale (log2) 5
Store EFER.LMA into IA-32e mode guest control yes
HLT activity state yes
Shutdown activity state yes
Wait-for-SIPI activity state yes
IA32_SMBASE support yes
Number of CR3-target values 4
MSR-load/store count recommenation 0
IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
VMWRITE to VM-exit information fields yes
MSEG revision identifier 0
VPID and EPT capabilities
Execute-only EPT translations yes
Page-walk length 4 yes
Paging-structure memory type UC yes
Paging-structure memory type WB yes
2MB EPT pages yes
1GB EPT pages yes
INVEPT supported yes
EPT accessed and dirty flags yes
Single-context INVEPT yes
All-context INVEPT yes
INVVPID supported yes
Individual-address INVVPID yes
Single-context INVVPID yes
All-context INVVPID yes
Single-context-retaining-globals INVVPID yes
VM Functions
EPTP Switching yes