[ CCing beck@ and tedu@ as a courtesy, because the commit that resulted in the following behavior was checked-in/OKed by them.
Also CCing mlarkin@ and Paul de Weerd because the bug report "vmd eating lots of memory" on misc@ may potentially be related: https://marc.info/?t=156363281300001&r=1&w=2 ] >Synopsis: When using a vnd(4) device, attempting to copy more than about 2 or 3 gigabytes to it causes the system to hang. The Cache: memory line increases to use all physical memory, but the system does not enter swap. >Category: system >Environment: System : OpenBSD 6.5 Details : OpenBSD 6.5-current (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) #6: Wed Jul 24 15:01:01 JST 2019 [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS Architecture: OpenBSD.amd64 Machine : amd64 >Description: When copying files to a vnd(4) device, the system will continually use more and more memory until it hangs. rsync(1) shows a process stuck in "needbuf" under the WAIT column. Though it only hangs partially. I can still move the mouse, and if an xterm has been opened, I can still type text into it. Trying to actually execute a program however, does not work. Text can still be typed into the terminal after pressing return, but the terminal does not respond to ^C and the program never launches. Switching from X11 to the console via CTRL+ALT+F1 causes a hard hang that requires holding the power button for 5 seconds in order to shut down the system. top(1) shows the "Cache:" line continually increasing up to the point of using all physical memory. However, once it gets there the system does not start using swap. The same behavior is exhibited when cp(1) is used instead of rsync(1), so it does not seem to be program specific. In some cases, I can ^C rsync or cp(1) and the system can still be used, but the process stuck in "needbuf" never goes away (or at least, not in the 5 or so minutes I waited). It also cannot be pkill(1)ed, even with a "pkill -9 rsync". I am aware that the following dmesg(1) comes from a custom kernel. However, the problem still occurs on a snapshot. >How-To-Repeat: # I can usually trigger the bug after copying 2-3 gigabytes of # files to the vnd(4) device. I create a 5 gigabyte vnd(4) device # here. If your system has much more memory than mine does (16 GB), # it may take more to trigger the bug. dd if=/dev/zero of=vntest1.raw bs=1m count=5000 vnconfig vnd0 vntest1.raw fdisk -i vnd0 disklabel -E vnd0 OpenBSD area: 128-10240000; size: 10239872; free: 0 # size offset fstype [fsize bsize cpg] c: 10240000 0 unused d: 10239872 128 4.2BSD 2048 16384 1 newfs -O2 vnd0d mount /dev/vnd0d /mnt/vndtest doas rsync -au --progress /usr/obj/* /mnt/vndtest # It doesn't matter where the files come from, I just picked a # filesystem containing numerous files, with a mix of large and # small files as well. When the system hangs, top shows the following: PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 95098 root -6 0 60M 4508K idle needbuf 0:29 0.00% rsync 43786 root 2 0 60M 5280K idle select 0:21 0.00% rsync 62524 root 10 0 32M 3520K idle inode 0:03 0.00% rsync Eventually, the rsync processes in "select" and "inode" will terminate, but the one stuck in "needbuf" does not. Since it may be relevant, I have tweaked a few knobs in sysctl.conf. For the sake of brevity, many default and commented lines have been omitted. If I have shot myself in the foot by turning these knobs, I will take full responsibility, however I have isolated the commit that caused the hang to start and experienced no issues prior. ---8<--- # $OpenBSD: sysctl.conf,v 1.56 2014/05/06 23:05:51 tedu Exp $ # # This file contains a list of sysctl options the user wants set at # boot time. See sysctl(3) and sysctl(8) for more information on # the many available variables. # net.inet.ip.forwarding=1 # 1=Permit forwarding (routing) of IPv4 packets net.inet6.ip6.forwarding=1 # 1=Permit forwarding (routing) of IPv6 packets machdep.kbdreset=1 # permit console CTRL-ALT-DEL to do a nice halt machdep.lidaction=0 # laptop lid closes cause a suspend 0=no 1=yes kern.bufcachepercent=60 # As of 2018-04-02, 90% breaks networking kern.maxfiles=30000 # increase maxfiles for KDE4 kern.shminfo.shmall=262144 # For KDE4, increase shared memory a ton for memory-hogs, in hw.pagesize units. Currently set to 1024MB kern.shminfo.shmmni=1024 # KDE4 wants this increased too kern.maxvnodes=1000000 # increase vnode cache to improve pathname->inode lookups kern.somaxconn=2048 # increase total possible open sockets kern.maxproc=4096 net.inet.ipcomp.enable=1 vfs.ffs.dirhash_maxmem=33554432 # increase dirhash memory to 32MB to speed up scanning large directories ---8<--- >Fix: Reverting the following commit fixes the issue for me. I also had to adjust src/sys/sys/buf.h because the prototype of getblk() changed from struct buf *getblk(struct vnode *, daddr_t, int, int, int); to struct buf *getblk(struct vnode *, daddr_t, int, int, uint64_t); in the interim. ---8<--- CVSROOT: /cvs Module name: src Changes by: [email protected] 2019/05/08 06:40:57 Modified files: sys/kern : vfs_bio.c vfs_biomem.c Log message: Modify the buffer cache to always flip recovered DMA buffers high. This also modifies the backoff logic to only back off what is requested and not a "mimimum" amount. Tested by me, benno@, tedu@ anda ports build by naddy@. ok tedu@ ---8<--- dmesg: OpenBSD 6.5-current (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) #6: Wed Jul 24 15:01:01 JST 2019 [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS real mem = 16858365952 (16077MB) avail mem = 16337305600 (15580MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbcc72000 (67 entries) bios0: vendor LENOVO version "GLET99WW (2.53 )" date 05/21/2019 bios0: LENOVO 20AWS27D00 acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT TCPA UEFI MSDM ASF! BATB FPDT UEFI DMAR acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) EXP3(S4) XHCI(S3) EHC1(S3) EHC2(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpiec0 at acpi0 acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 2694.17 MHz, 06-3c-03 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 2693.77 MHz, 06-3c-03 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 2693.77 MHz, 06-3c-03 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 2693.77 MHz, 06-3c-03 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf8000000, bus 0-63 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG0) acpiprt2 at acpi0: bus -1 (PEG_) acpiprt3 at acpi0: bus 2 (EXP1) acpiprt4 at acpi0: bus 3 (EXP2) acpiprt5 at acpi0: bus -1 (EXP3) acpiprt6 at acpi0: bus -1 (EXP6) acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2 acpipwrres1 at acpi0: NVP3, resource for PEG_ acpipwrres2 at acpi0: NVP2, resource for PEG_ acpitz0 at acpi0: critical temperature is 200 degC acpibtn0 at acpi0: LID_ acpibtn1 at acpi0: SLPB acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001 acpicmos0 at acpi0 tpm0 at acpi0: TPM_ addr 0xfed40000/0x5000, device 0x0000104a rev 0x4e acpibat0 at acpi0: BAT0 model "45N1161" serial 3584 type LION oem "LGC" acpiac0 at acpi0: AC unit online acpithinkpad0 at acpi0 "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "INT340F" at acpi0 not configured acpivideo0 at acpi0: VID_ acpivout at acpivideo0 not configured acpivideo1 at acpi0: VID_ cpu0: using VERW MDS workaround (except on vmm entry) cpu0: Enhanced SpeedStep 2694 MHz: speeds: 2701, 2700, 2600, 2400, 2300, 2200, 2000, 1900, 1700, 1600, 1500, 1300, 1200, 1100, 900, 800 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06 drm0 at inteldrm0 inteldrm0: msi azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi azalia0: No codecs found xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x04: msi, xHCI 1.0 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 addr 1 "Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi, address xx:xx:xx:xx:xx:xx ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 16 usb1 at ehci0: USB revision 2.0 uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 azalia1 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x04: msi azalia1: codecs: Realtek ALC292 audio0 at azalia1 ppb0 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd4: msi pci1 at ppb0 bus 2 rtsx0 at pci1 dev 0 function 0 "Realtek RTS5227 Card Reader" rev 0x01: msi sdmmc0 at rtsx0: 4-bit, dma ppb1 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd4: msi pci2 at ppb1 bus 3 iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x83, msi ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 23 usb2 at ehci1: USB revision 2.0 uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 pcib0 at pci0 dev 31 function 0 "Intel QM87 LPC" rev 0x04 ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x04: msi, AHCI 1.3 ahci0: port 0: 6.0Gb/s ahci0: port 5: 1.5Gb/s scsibus1 at ahci0: 32 targets sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 860, RVT0> SCSI3 0/direct fixed XXXXXXXXXXXXXX sd0: 953869MB, 512 bytes/sector, 1953525168 sectors, thin cd0 at scsibus1 targ 5 lun 0: <PLDS, DVD-RW DU8A5SH, BU51> ATAPI 5/cdrom removable ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x04: apic 2 int 18 iic0 at ichiic0 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 wsmouse1 at pms0 mux 0 pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 vmm0 at mainbus0: VMX/EPT sdmmc0: can't enable card uhub0: device problem, disabling port 2 uhidev0 at uhub0 port 6 configuration 1 interface 0 "Logitech USB Laser Mouse" rev 2.00/56.01 addr 2 uhidev0: iclass 3/1 ums0 at uhidev0: 8 buttons, Z and W dir wsmouse2 at ums0 mux 0 ugen0 at uhub0 port 7 "Validity Sensors VFS5011 Fingerprint Reader" rev 1.10/0.78 addr 3 ugen1 at uhub0 port 11 "Intel product 0x07dc" rev 2.00/0.01 addr 4 uvideo0 at uhub0 port 12 configuration 1 interface 0 "SunplusIT INC. Integrated Camera" rev 2.00/0.03 addr 5 video0 at uvideo0 umass0 at uhub0 port 16 configuration 1 interface 0 "Seagate Expansion Desk" rev 3.00/1.00 addr 6 umass0: using SCSI over Bulk-Only scsibus2 at umass0: 2 targets, initiator 0 sd1 at scsibus2 targ 1 lun 0: <Seagate, Expansion Desk, 9441> SCSI4 0/direct fixed serial.XXXXXXXXXXX sd1: 7630885MB, 512 bytes/sector, 15628053167 sectors uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.04 addr 2 uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.04 addr 2 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets softraid0: sd2 was not shutdown properly sd2 at scsibus4 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed sd2: 926715MB, 512 bytes/sector, 1897913570 sectors root on sd2a (820063e3b3244e77.a) swap on sd2b dump on sd2b WARNING: / was not properly unmounted inteldrm0: 1920x1080, 32bpp wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0 wsdisplay0: screen 1-5 added (std, vt100 emulation) iwm0: hw rev 0x140, fw ver 16.242414.0, address xx:xx:xx:xx:xx:xx
