On Thu, Jun 01, 2017 at 12:58:27PM +0200, Martin Pieuchot wrote:
> On 01/06/17(Thu) 10:47, Martin Pieuchot wrote:
> > On 31/05/17(Wed) 22:43, Gregor Best wrote:
> > > Hi people,
> > >
> > > with a fresh snapshot (according to the dates on the FTP server from
> > > today), and a kernel built from CVS from about 20:15 UTC+1, vmd's child
> > > processes reproducibly enter an infinite loop relatively early after
> > > booting an Alpine Linux VM. The problem is not reproducible when booting
> > > my /bsd.rd.
> >
> > Thanks for reporting the problem, it's certainly related to the new
> > condvard. I reverted librthread to the old implementation for the
> > moment.
>
> Apparently it is not. I tried to track the problem down and even with
> reverted libpthread and a pre-hackathon vmd(8), I have an interrupt
> problem in my guest.
>
I can repro this locally. It's probably a bug that's been hitting us randomly
since Cambridge but I've been unable to nail it down. Certainly timing related.
Since it's now more pronounced, Ill try to track it down over the next few days.
-ml
> In my case I'm trying to install alpine linux using your ISO:
>
> # cat /etc/vmd.conf
> vm "alpine" {
> memory 128M
> disk "/home/vmd/alpine"
> local interface
> disable
> }
> # vmctl start alpine -d alpine-virt-3.6.0-x86_64.iso -c
>
> The host freeze once I typed "setup-alpine":
>
> localhost:~# setup-alpine
> 7% [## ]
>
> A that time one of the threads is doing a crazy loop entering/existing
> thrsleep:
>
> PID TID PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
> 53152 557765 28 0 515M 10M onproc/2 - 0:12 0.00% vmd
> 53152 193294 2 0 515M 10M sleep/1 kqread 0:01 0.00% vmd
> 53152 123587 28 0 515M 10M idle thrslee 0:00 0.00% vmd
>
> Here's what the two non-IDLE threads are doing in a loop:
>
> 53152/193294 vmd RET kevent 0
> 53152/193294 vmd CALL clock_gettime(CLOCK_MONOTONIC,0x115e7354e140)
> 53152/193294 vmd STRU struct timespec { 2785.967019509 }
> 53152/193294 vmd RET clock_gettime 0
> 53152/193294 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115e7354e0f4)
> 53152/193294 vmd RET ioctl 0
> 53152/193294 vmd CALL __thrwakeup(0x115e6c571638,1)
> 53152/193294 vmd RET __thrwakeup 0
> 53152/193294 vmd CALL
> kevent(5,0x115dee35f000,0,0x115e7971d800,64,0x115e7354e110)
> 53152/193294 vmd STRU struct timespec { 0.009999000 }
> 53152/557765 vmd RET __thrsleep 0
> 53152/557765 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115ebba883c4)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,PVBUSIOC_KVWRITE,0x115e5e28fbe0)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115ebba883c4)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,PVBUSIOC_KVWRITE,0x115e5e28fbe0)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115ebba883c4)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,PVBUSIOC_KVWRITE,0x115e5e28fbe0)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115ebba883c4)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,PVBUSIOC_KVWRITE,0x115e5e28fbe0)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,_IOW('V',6,0xc),0x115ebba883c4)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL ioctl(4,PVBUSIOC_KVWRITE,0x115e5e28fbe0)
> 53152/557765 vmd RET ioctl 0
> 53152/557765 vmd CALL
> __thrsleep(0x115e6c571638,CLOCK_REALTIME,0,0x115e9c14c8c0,0x115e6c571704)
>
> What's interesting is that if I press a key on my keyboard, the install
> goes further:
>
> localhost:~# setup-alpine
> 14% [##### ]
>
> So I assume it's an interrupt related problem. Pressing a 'key' works up
> to some point. Then the host completely hang. At this moment both threads
> are waiting in the kernel:
>
> PID TID PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
> 53152 193294 64 0 515M 10M onproc/2 - 1:36 98.68% vmd
> 53152 557765 28 0 515M 10M sleep/3 thrslee 0:15 0.00% vmd
> 53152 123587 28 0 515M 10M idle thrslee 0:00 0.00% vmd
>
> Here's what ktrace reports, similar to what tedu@ reported [0] as well:
>
> 53152/193294 vmd STRU struct timespec { 0.001893000 }
> 53152/193294 vmd STRU struct kevent { ident=8, filter=EVFILT_READ,
> flags=0x1<EV_ADD>, fflags=0<>, data=7, udata=0x115bd87a6638 }
> 53152/193294 vmd RET kevent 1
> 53152/193294 vmd CALL clock_gettime(CLOCK_MONOTONIC,0x115e7354e140)
> 53152/193294 vmd STRU struct timespec { 3064.065132538 }
> 53152/193294 vmd RET clock_gettime 0
> 53152/193294 vmd CALL
> kevent(5,0x115dee35f000,0,0x115e7971d800,64,0x115e7354e110)
> 53152/193294 vmd STRU struct timespec { 0.001862000 }
> 53152/193294 vmd STRU struct kevent { ident=8, filter=EVFILT_READ,
> flags=0x1<EV_ADD>, fflags=0<>, data=7, udata=0x115bd87a6638 }
> 53152/193294 vmd RET kevent 1
> 53152/193294 vmd CALL clock_gettime(CLOCK_MONOTONIC,0x115e7354e140)
> 53152/193294 vmd STRU struct timespec { 3064.065167529 }
> 53152/193294 vmd RET clock_gettime 0
> 53152/193294 vmd CALL
> kevent(5,0x115dee35f000,0,0x115e7971d800,64,0x115e7354e110)
> 53152/193294 vmd STRU struct timespec { 0.001827000 }
> 53152/193294 vmd STRU struct kevent { ident=8, filter=EVFILT_READ,
> flags=0x1<EV_ADD>, fflags=0<>, data=7, udata=0x115bd87a6638 }
> 53152/193294 vmd RET kevent 1
> 53152/193294 vmd CALL clock_gettime(CLOCK_MONOTONIC,0x115e7354e140)
> 53152/193294 vmd STRU struct timespec { 3064.065179123 }
> 53152/193294 vmd RET clock_gettime 0
> 53152/193294 vmd CALL
> kevent(5,0x115dee35f000,0,0x115e7971d800,64,0x115
>
> [0] http://marc.info/?l=openbsd-bugs&m=149622771015472&w=2
>
> Dmesg below.
>
> OpenBSD 6.1-current (GENERIC.MP) #456: Thu Jun 1 11:48:17 CEST 2017
> [email protected]:/sys/arch/amd64/compile/GENERIC.MP
> RTC BIOS diagnostic error 80<clock_battery>
> real mem = 8238301184 (7856MB)
> avail mem = 7982833664 (7613MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xccbfd000 (65 entries)
> bios0: vendor LENOVO version "N14ET26W (1.04 )" date 01/23/2015
> bios0: LENOVO 20BS006BGE
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SLIC ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT
> SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT UEFI MSDM BATB FPDT UEFI DMAR
> acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpiec0 at acpi0
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.81 MHz
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: TSC frequency 2394813840 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.46 MHz
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.46 MHz
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, 2394.46 MHz
> cpu3:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 40 pins
> acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (PEG_)
> acpiprt2 at acpi0: bus 3 (EXP1)
> acpiprt3 at acpi0: bus 4 (EXP2)
> acpiprt4 at acpi0: bus -1 (EXP3)
> acpiprt5 at acpi0: bus -1 (EXP6)
> acpicpu0 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33),
> C1(1000@1 mwait.1), PSS
> acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1
> acpipwrres1 at acpi0: NVP3, resource for PEG_
> acpipwrres2 at acpi0: NVP2, resource for PEG_
> acpitz0 at acpi0: critical temperature is 128 degC
> acpibtn0 at acpi0: LID_
> acpibtn1 at acpi0: SLPB
> "LEN0071" at acpi0 not configured
> "LEN0048" at acpi0 not configured
> acpibat0 at acpi0: BAT0 model "00HW003" serial 887 type LiP oem "SMP"
> acpiac0 at acpi0: AC unit online
> acpithinkpad0 at acpi0
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "INT340F" at acpi0 not configured
> acpivideo0 at acpi0: VID_
> acpivout at acpivideo0 not configured
> acpivideo1 at acpi0: VID_
> cpu0: Enhanced SpeedStep 2394 MHz: speeds: 2401, 2400, 2300, 2100, 2000,
> 1900, 1700, 1600, 1400, 1300, 1200, 1000, 900, 800, 600, 500 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 5G Host" rev 0x09
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 5500" rev 0x09
> drm0 at inteldrm0
> inteldrm0: msi
> inteldrm0: 2560x1440, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> azalia0 at pci0 dev 3 function 0 "Intel Core 5G HD Audio" rev 0x09: msi
> xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x03: msi
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00
> addr 1
> "Intel 9 Series MEI" rev 0x03 at pci0 dev 22 function 0 not configured
> em0 at pci0 dev 25 function 0 "Intel I218-V" rev 0x03: msi, address
> 54:ee:75:43:b3:62
> azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x03: msi
> azalia1: codecs: Realtek ALC292
> audio0 at azalia1
> ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xe3: msi
> pci1 at ppb0 bus 3
> ppb1 at pci0 dev 28 function 1 "Intel 9 Series PCIE" rev 0xe3: msi
> pci2 at ppb1 bus 4
> iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x59, msi
> ehci0 at pci0 dev 29 function 0 "Intel 9 Series USB" rev 0x03: apic 2 int 23
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00
> addr 1
> pcib0 at pci0 dev 31 function 0 "Intel 9 Series LPC" rev 0x03
> ahci0 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x03: msi, AHCI 1.3
> ahci0: port 3: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 3 lun 0: <ATA, SAMSUNG MZNTE256, EXT2> SCSI3 0/direct
> fixed naa.5002538844584d30
> sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin
> ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x03: apic 2 int
> 18
> iic0 at ichiic0
> pchtemp0 at pci0 dev 31 function 6 "Intel 9 Series Thermal" rev 0x03
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> wsmouse1 at pms0 mux 0
> pms0: Synaptics clickpad, firmware 8.1
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> vmm0 at mainbus0: VMX/EPT
> umb0 at uhub0 port 4 configuration 1 interface 0 "Sierra Wireless Inc. Sierra
> Wireless EM7345 4G LTE" rev 2.00/17.29 addr 2
> umodem0 at uhub0 port 4 configuration 1 interface 2 "Sierra Wireless Inc.
> Sierra Wireless EM7345 4G LTE" rev 2.00/17.29 addr 2
> umodem0: data interface 3, has no CM over data, has break
> umodem0: status change notification available
> ucom0 at umodem0
> ugen0 at uhub0 port 7 "Intel Bluetooth" rev 2.01/0.01 addr 3
> uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony Electronics
> Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 4
> video0 at uvideo0
> uhub2 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub"
> rev 2.00/0.03 addr 2
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
> sd1: 244190MB, 512 bytes/sector, 500102858 sectors
> root on sd1a (72519550243ad631.a) swap on sd1b dump on sd1b
>