Hi,all
I have met some problems while utilizing KVM。
The test environment is:
Summary: Dell R610, 1 x Xeon E5645 2.40GHz, 47.1GB / 48GB 1333MHz DDR3
System: Dell PowerEdge R610 (Dell 08GXHX)
Processors: 1 (of 2) x Xeon E5645 2.40GHz 5860MHz FSB (HT enabled,
6 cores, 24 threads)
Memory: 47.1GB / 48GB 1333MHz DDR3 == 12 x 4GB
Disk: sda: 299GB (72%) JBOD
Disk: sdb (host9): 5.0TB JBOD == 1 x VIRTUAL-DISK
Disk: sdc (host11): 5.0TB JBOD == 1 x VIRTUAL-DISK
Disk: sdd (host12): 5.0TB JBOD == 1 x VIRTUAL-DISK
Disk: sde (host10): 5.0TB JBOD == 1 x VIRTUAL-DISK
Disk-Control: mpt2sas0: LSI Logic / Symbios Logic SAS2008
PCI-Express Fusion-MPT SAS-2 [Falcon]
Disk-Control: host9:
Disk-Control: host10:
Disk-Control: host11:
Disk-Control: host12:
Chipset: Intel 82801IB (ICH9)
Network: br1 (bridge): 14:fe:b5:dc:2c:6e
Network: em1 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
14:fe:b5:dc:2c:6e, 1000Mb/s <full-duplex>
Network: em2 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
14:fe:b5:dc:2c:70, 1000Mb/s <full-duplex>
Network: em3 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
14:fe:b5:dc:2c:72, 1000Mb/s <full-duplex>
Network: em4 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit,
14:fe:b5:dc:2c:74, 1000Mb/s <full-duplex>
Network: vnet0 (tun): fe:16:3e:49:fb:05, 10Mb/s <full-duplex>
Network: vnet1 (tun): fe:16:3e:cb:c0:d1, 10Mb/s <full-duplex>
Network: vnet2 (tun): fe:16:3e:1e:c1:c4, 10Mb/s <full-duplex>
Network: vnet3 (tun): fe:16:3e:d5:58:f4, 10Mb/s <full-duplex>
Network: vnet4 (tun): fe:16:3e:15:b4:16, 10Mb/s <full-duplex>
Network: vnet5 (tun): fe:16:3e:d2:07:47, 10Mb/s <full-duplex>
Network: vnet6 (tun): fe:16:3e:e1:2b:b9, 10Mb/s <full-duplex>
OS: RHEL Server 6.1 (Santiago), Linux
2.6.32-220.2.1.el6.x86_64 x86_64, 64-bit
BIOS: Dell 3.0.0 01/31/2011
And during the term i utilize KVM, some issues happen:
1. Host Crash Caused by
a. Kernel Panic
31 KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux
32 DUMPFILE: ../vmcore_2012.13.46 [PARTIAL DUMP]
33 CPUS: 24
34 DATE: Wed Jan 11 13:34:13 2012
35 UPTIME: 25 days, 04:11:05
36 LOAD AVERAGE: 223.16, 172.97, 158.23
37 TASKS: 1464
38 NODENAME: dell2.localdomain
39 RELEASE: 2.6.32-131.12.1.el6.x86_64
40 VERSION: #1 SMP Sun Jul 31 16:44:56 EDT 2011
41 MACHINE: x86_64 (2394 Mhz)
42 MEMORY: 48 GB
43 PANIC: "kernel BUG at arch/x86/kernel/traps.c:547!"
44 PID: 11851
45 COMMAND: "qemu-kvm"
46 TASK: ffff880c071c3500 [THREAD_INFO: ffff880c132d8000]
47 CPU: 1
48 STATE: TASK_RUNNING (PANIC)
49
50 PID: 11851 TASK: ffff880c071c3500 CPU: 1 COMMAND: "qemu-kvm"
51 #0 [ffff880028207be0] machine_kexec at ffffffff810310cb
52 #1 [ffff880028207c40] crash_kexec at ffffffff810b6392
53 #2 [ffff880028207d10] oops_end at ffffffff814de670
54 #3 [ffff880028207d40] die at ffffffff8100f2eb
55 #4 [ffff880028207d70] do_trap at ffffffff814ddf64
56 #5 [ffff880028207dd0] do_invalid_op at ffffffff8100ceb5
57 #6 [ffff880028207e70] invalid_op at ffffffff8100bf5b
58 [exception RIP: do_nmi+554]
59 RIP: ffffffff814de43a RSP: ffff880028207f28 RFLAGS: 00010002
60 RAX: ffff880c132d9fd8 RBX: ffff880028207f58 RCX: 00000000c0000101
61 RDX: 00000000ffff8800 RSI: ffffffffffffffff RDI: ffff880028207f58
62 RBP: ffff880028207f48 R8: ffff88005ebf9800 R9: ffff880028203fc0
63 R10: 0000000000000034 R11: 00000000000003e8 R12: 000000000000cc20
64 R13: ffffffff816024a0 R14: ffff88005ebf9800 R15: 00007ffffffff000
65 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
66 #7 [ffff880028207f50] nmi at ffffffff814ddc90
67 [exception RIP: bad_to_user+37]
68 RIP: ffffffff814e4e2b RSP: ffff880028207bb0 RFLAGS: 00010046
69 RAX: ffff880c132d9fd8 RBX: ffff880c132d9c48 RCX: 0000000000000001
70 RDX: 0000000000000000 RSI: 000000010000000b RDI: ffff880028207c08
71 RBP: ffff880028207c48 R8: ffff88005ebf9800 R9: ffff880028203fc0
72 R10: 0000000000000034 R11: 00000000000003e8 R12: 000000000000cc20
73 R13: ffffffff816024a0 R14: ffff88005ebf9800 R15: 00007ffffffff000
74 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
75 --- <NMI exception stack> ---
For this problem, i found that panic is caused by
BUG_ON(in_nmi()) which means NMI happened during another NMI Context;
But i check the Intel Technical Manual and found "While an NMI
interrupt handler is executing, the processor disables additional
calls to the NMI handler until the next IRET instruction is executed."
So, how this happen?
b. Qemu Process's CPU dead lock
28 KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux
29 DUMPFILE: /var/crash/127.0.0.1-2012-02-18-21:20:13/vmcore
[PARTIAL DUMP]
30 CPUS: 24
31 DATE: Sat Feb 18 20:03:56 2012
32 UPTIME: 71 days, 09:42:23
33 LOAD AVERAGE: 46.81, 44.32, 35.15
34 TASKS: 1018
35 NODENAME: virt15-njhx-kvm-19
36 RELEASE: 2.6.32-131.12.1.el6.x86_64
37 VERSION: #1 SMP Sun Jul 31 16:44:56 EDT 2011
38 MACHINE: x86_64 (2394 Mhz)
39 MEMORY: 48 GB
40 PANIC: "Kernel panic - not syncing: Watchdog detected hard
LOCKUP on cpu 12"
41 PID: 18704
42 COMMAND: "qemu-kvm"
43 TASK: ffff880041efb580 [THREAD_INFO: ffff8807309ba000]
44 CPU: 12
45 STATE: TASK_RUNNING (PANIC)
46
47 crash> bt
48 PID: 18704 TASK: ffff880041efb580 CPU: 12 COMMAND: "qemu-kvm"
49 #0 [ffff8806454c7af0] machine_kexec at ffffffff810310cb
50 #1 [ffff8806454c7b50] crash_kexec at ffffffff810b6392
51 #2 [ffff8806454c7c20] panic at ffffffff814da64f
52 #3 [ffff8806454c7ca0] watchdog_overflow_callback at ffffffff810d648d
53 #4 [ffff8806454c7cc0] __perf_event_overflow at ffffffff81108b26
54 #5 [ffff8806454c7d60] perf_event_overflow at ffffffff81109119
55 #6 [ffff8806454c7d70] intel_pmu_handle_irq at ffffffff8101dd46
56 #7 [ffff8806454c7e80] perf_event_nmi_handler at ffffffff814debd8
57 #8 [ffff8806454c7ea0] notifier_call_chain at ffffffff814e0735
58 #9 [ffff8806454c7ee0] atomic_notifier_call_chain at ffffffff814e079a
59 #10 [ffff8806454c7ef0] notify_die at ffffffff8109411e
60 #11 [ffff8806454c7f20] do_nmi at ffffffff814de383
61 #12 [ffff8806454c7f50] nmi at ffffffff814ddc90
62 RIP: 00000000004083ab RSP: 00007fffc80115d8 RFLAGS: 00000206
63 RAX: 000000007e2bf790 RBX: 0000000001c753f0 RCX: 0000000000008000
64 RDX: 0000000000000000 RSI: 0000093b76bfc600 RDI: 000000001277546d
65 RBP: 0000000000000200 R8: 00000000fbc80000 R9: 0000000000000000
66 R10: 0000000000000064 R11: 0000000000000246 R12: 1277546d7d3d8c69
67 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
68 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
69 --- <NMI exception stack> ---
2. Guest Boot Hang when lots of guest create requests are processed
at a same time by libvirt;
The guest is configured with -smp 1.
So, anyone has any idea about these?
Thx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html