On 07/12/17(Thu) 08:34, Landry Breuil wrote:
> Hi,
> 
> i've been having kvm VMs running 6.2 hardlocking/deadlocking since a
> while, all those running on proxmox 5.1 using linux 4.13.8 & qemu-kvm
> 2.9.1. There were hardlocks upon reboot which were 'solved' by disabling
> x2apic emulation in kvm (args: -cpu=kvm64,-x2apic) or giving the host
> cpu flags to the vm (args: -cpu host) but there still remains deadlocks
> during normal operation.
> 
> I'm now running a kernel with MP_LOCKDEBUG, so i'm collecting traces in
> the vain hope that it might help someone interested in locking issues.
> Here's the latest one:

Let me add that when you had x2apic enabled the kernel 'froze' inside
x2apic_readreg, trace below:

  ddb{0}> tr
  x2apic_readreg(10) at x2apic_readreg+0xf
  lapic_delay(ffff800022136900) at lapic_delay+0x5c
  rtcput(ffff800022136960) at rtcput+0x65
  resettodr() at resettodr+0x1d6
  perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
  taskq_thread(0) at taskq_thread+0x67
  end trace frame: 0x0, count: -6

What you're seeing with a MP_LOCKDEBUG kernel is just a symptom.  A CPU
enters DDB because another one is 'frozen' while holding the
KERNEL_LOCK().  What's interesting is that in both case the frozen CPU
is trying to execute apic related code:
  - x2apic_readreg
  - lapic_delay

I believe this issue should be reported to KVM developers as well.

> login: __mp_lock(0xffffffff81b23ed0): lock spun outStopped at  db_enter+0x5:  
>  popq    %rbp
> ddb{2}> trace
> db_enter() at db_enter+0x5
> ___mp_lock(40) at ___mp_lock+0x66
> syscall() at syscall+0x1ff
> --- syscall (number 4) ---
> end of kernel
> end trace frame: 0xb3b20d05fe0, count: -3
> 0xb3ba081bbfa:
> ddb{2}> mach ddbcpu 1
> Stopped at      x86_ipi_db+0x5: popq    %rbp
> ddb{1}> tr
> x86_ipi_db(ffff800022136530) at x86_ipi_db+0x5
> x86_ipi_handler() at x86_ipi_handler+0x6b
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1f
> --- interrupt ---
> Bad frame pointer: 0xfffffe4fe85250cc
> end trace frame: 0xfffffe4fe85250cc, count: -3
> 0x41cb8c419c524153:
> ddb{1}> ps /o
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  216909  39313    726           0  0x4000000    2  rrdcached
> *366090  74010      0     0x14000      0x200    1  systq
> ddb{1}> tr /p 0t366090
> lapic_delay(ffffffff81a97c88) at lapic_delay+0x5c
> rtcget(ffffffff81a97c88) at rtcget+0x1a
> resettodr() at resettodr+0x3a
> perform_resettodr(ffffffff8150a7d9) at perform_resettodr+0x9
> taskq_thread(0) at taskq_thread+0x67
> end trace frame: 0x0, count: -5
> ddb{1}> tr /p 0t216909
> uvm_fault(0xffffffff81b526f8, 0x0, 0, 1) -> e
> kernel: double fault trap, code=0
> Faulted in DDB; continuing...
> 
> if i continue at that point or try bo sync, ddb is kaput and i can only stop
> the vm. Of course an option would be to run bsd.sp, but that's a bit .. sad.
> 
> dmesg attached.
> 
> Landry

> OpenBSD 6.2-stable (GENERIC.MP) #0: Fri Dec  1 11:27:09 CET 2017
>     [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17162944512 (16367MB)
> avail mem = 16635760640 (15865MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf68d0 (10 entries)
> bios0: vendor SeaBIOS version 
> "rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org" date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: rev 0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC HPET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 286.92 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> 64b/line 16-way L2 cache
> cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 999MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 472.57 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> 64b/line 16-way L2 cache
> cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 465.40 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> 64b/line 16-way L2 cache
> cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz, 467.31 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,ARAT
> cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> 64b/line 16-way L2 cache
> cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
> acpihpet0 at acpi0: 100000000 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicpu2 at acpi0: C1(@1 halt!)
> acpicpu3 at acpi0: C1(@1 halt!)
> "ACPI0006" at acpi0 not configured
> "PNP0F13" at acpi0 not configured
> "PNP0700" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "QEMU0002" at acpi0 not configured
> "ACPI0010" at acpi0 not configured
> pvbus0 at mainbus0: KVM
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
> wired to compatibility, channel 1 wired to compatibility
> pciide0: channel 0 disabled (no drives)
> pciide0: channel 1 disabled (no drives)
> uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11
> piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
> iic0 at piixpm0
> vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Memory" rev 0x00
> viomb0 at virtio0
> virtio0: apic 0 int 11
> virtio1 at pci0 dev 10 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio1
> scsibus1 at vioblk0: 2 targets
> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
> sd0: 153600MB, 512 bytes/sector, 314572800 sectors
> virtio1: msix shared
> virtio2 at pci0 dev 18 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio2: address 46:ae:54:5a:0d:80
> virtio2: msix shared
> ppb0 at pci0 dev 30 function 0 "Red Hat Qemu PCI-PCI" rev 0x00
> pci1 at ppb0 bus 1
> ppb1 at pci0 dev 31 function 0 "Red Hat Qemu PCI-PCI" rev 0x00
> pci2 at ppb1 bus 2
> isa0 at pcib0
> isadma0 at isa0
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: console
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 
> addr 1
> uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB Tablet" 
> rev 2.00/0.00 addr 2
> uhidev0: iclass 3/0
> ums0 at uhidev0: 3 buttons, Z dir
> wsmouse1 at ums0 mux 0
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on sd0a (0a797a44a76827db.a) swap on sd0b dump on sd0b
> fd0 at fdc0 drive 1: density unknown

Reply via email to