I wanted to follow-up on this for the archives since I found the cause (and solution).
After trying various things, I discovered that having had set kern.bufcachepercent=90 in /etc/sysctl.conf, I would get these network drops with the aforementioned issues related to the "em0: unable to fill any rx descriptors" errors. I returned the setting to the default of kern.bufcachepercent=20 and the problems went away. I then tried to set it to kern.bufcachepercent=40 and things are still OK. So for the archives, it seems like setting kern.bufcachepercent=90 is a bit too aggressive and (at least on my system) affects em0 being able to allocate enough memory to function properly under high load. I guess this is the reason why the default kern.bufcachepercent is set to what it is. I broke my system and kept all the pieces, though it looks like I finally managed to put them all back together again. I apologize if I caused any developers to waste their time chasing a bug that I myself caused by tweaking a knob too far into the red. My apologies. -- Bryan On 2017-09-14 19:32:15, Bryan Linton <b...@shoshoni.info> wrote: > FWIW, I get the same behavior on GENERIC.MP, so I don't think the > PPPOE_TERM_UNKNOWN_SESSIONS kernel option is causing this. > > If I can provide any more information, please let me know. > > -- > Bryan > > On 2017-09-03 05:59:51, Bryan Linton <b...@shoshoni.info> wrote: > > >Synopsis: em0 loses connectivity due to low mbufs > > >Category: system > > >Environment: > > System : OpenBSD 6.2 > > Details : OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) > > #22: Wed Aug 30 19:23:17 JST 2017 > > > > shoshon...@shoshoni-m.shoshoni.info:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > > > I've been seeing random drops in network connectivity that I've > > traced to what appears to be not enough mbufs being used. The > > issue is seen on the following em0 controller in a Thinkpad T440p: > > > > em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi > > > > A freshly booted system will work fine, but heavily using the > > network (for example, by transferring large files over a LAN) will > > cause connectivity to drop sooner rather than later. But even > > only lightly using the network will eventually cause the issue to > > surface. > > > > Rebooting always fixes the issue. Issuing a "zzz" command will > > also fix it most of the time, but not always. Some of the time, a > > simple "ifconfig em0 down up" will fix it, but usually only once > > or twice. After that, the system must either be zzz'ed or > > rebooted. > > > > I've attached "systat mbuf" output for various states of "working" > > vs. "not working". > > > > Note how the ALIVE value drops below the LWM value when it's not > > working. > > > > FULL DISCLOSURE: I am running a kernel with the > > PPPOE_TERM_UNKNOWN_SESSIONS option set. I do not believe it > > would affect the em0 driver, but I suppose it's possible that it > > could. > > > > vvvvvvvvvvvvvvvvvvv WORKING vvvvvvvvvvvvvvvvvvvvv > > > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System 0 256 418 30 > > 2048 17 9 > > 2112 15 5 > > 4096 256 37 > > lo0 > > em0 2050 15 10 256 15 > > iwm0 > > enc0 > > pppoe0 > > pflog0 > > > > > > > > > > vvvvvvv NON-WORKING (immediately after connectivity is lost) vvvvvvvvv > > (Note that ALIVE is lower than LWM and CWM is currently 145) > > > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System 0 256 1509 160 > > 2048 31 29 > > 2112 1036 72 > > 4096 256 41 > > lo0 > > em0 2050 2 10 256 145 > > iwm0 > > enc0 > > pppoe0 > > pflog0 > > > > > > vvvvvvvvvvvvvvv NON-WORKING (after a minute or so) vvvvvvvvvvvvvvvvv > > (Note how the CWM has steadily increased to 256 over the last > > minute) > > > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System 0 256 1656 160 > > 2048 32 29 > > 2112 1179 81 > > 4096 256 41 > > lo0 > > em0 2050 2 10 256 256 > > iwm0 > > enc0 > > pppoe0 > > pflog0 > > > > > > vvvvvvvvvvvvvvv NON-WORKING (after "ifconfig em0 down up") vvvvvvvvvvvv > > (In this instance, "ifconfig em0 down up" didn't work, but it did > > reset CWM back to LWM) > > > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System 0 256 1711 160 > > 2048 43 29 > > 2112 1191 82 > > 4096 256 41 > > lo0 > > em0 2050 1 10 256 10 > > iwm0 > > enc0 > > pppoe0 > > pflog0 > > > > > > vvvvvvvvvvv NON-WORKING (after repeated "ifconfig em0 down up") vvvvvvvv > > (Now there is nothing reported for em0 at all. After getting into > > this state, dmesg showed the following lines: > > > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > em0: unable to fill any rx descriptors > > ) > > > > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > > System 0 256 1742 160 > > 2048 54 29 > > 2112 1215 84 > > 4096 256 41 > > lo0 > > em0 > > iwm0 > > enc0 > > pppoe0 > > pflog0 > > ----------------------------------------------- > > > > I am willing to provide any additional needed information, as well > > as test any potential patches. Please let me know if I can > > provide any additional details. > > > > > > >How-To-Repeat: > > Saturate the network connection. Eventually, the system > > will stop receiving network data. > > >Fix: > > Temporary fix: Either issue "ifconfig em0 down up", "zzz", > > or reboot. > > > > Permanent fix: Unknown. I attempted to revert, in turn, > > the if_em* files all the way up to a Jan 23rd commit to > > see if it was due to any recent commits there, but the > > kernel panicked upon booting when the if_em* files were > > reverted to that point. I think there has been too much > > progress in the rest of the system to sucessfully revert > > to such a long time ago. > > > > > > dmesg: > > OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) #22: Wed Aug 30 > > 19:23:17 JST 2017 > > > > shoshon...@shoshoni-m.shoshoni.info:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS > > real mem = 12539871232 (11958MB) > > avail mem = 12152803328 (11589MB) > > mpath0 at root > > scsibus0 at mpath0: 256 targets > > mainbus0 at root > > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbcc0d000 (67 entries) > > bios0: vendor LENOVO version "GLET85WW (2.39 )" date 09/29/2016 > > bios0: LENOVO 20AWS27D00 > > acpi0 at bios0: rev 2 > > acpi0: sleep states S0 S3 S4 S5 > > acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT > > SSDT SSDT SSDT PCCT SSDT TCPA UEFI MSDM ASF! BATB FPDT UEFI DMAR > > acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) EXP3(S4) XHCI(S3) > > EHC1(S3) EHC2(S3) > > acpitimer0 at acpi0: 3579545 Hz, 24 bits > > acpiec0 at acpi0 > > acpihpet0 at acpi0: 14318179 Hz > > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat > > cpu0 at mainbus0: apid 0 (boot processor) > > cpu0: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2594.37 MHz > > cpu0: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT > > cpu0: 256KB 64b/line 8-way L2 cache > > cpu0: TSC frequency 2594368320 Hz > > cpu0: smt 0, core 0, package 0 > > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > > cpu0: apic clock running at 99MHz > > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE > > cpu1 at mainbus0: apid 1 (application processor) > > cpu1: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz > > cpu1: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT > > cpu1: 256KB 64b/line 8-way L2 cache > > cpu1: smt 1, core 0, package 0 > > cpu2 at mainbus0: apid 2 (application processor) > > cpu2: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz > > cpu2: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT > > cpu2: 256KB 64b/line 8-way L2 cache > > cpu2: smt 0, core 1, package 0 > > cpu3 at mainbus0: apid 3 (application processor) > > cpu3: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz > > cpu3: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT > > cpu3: 256KB 64b/line 8-way L2 cache > > cpu3: smt 1, core 1, package 0 > > ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins > > acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63 > > acpiprt0 at acpi0: bus 0 (PCI0) > > acpiprt1 at acpi0: bus -1 (PEG0) > > acpiprt2 at acpi0: bus -1 (PEG_) > > acpiprt3 at acpi0: bus 2 (EXP1) > > acpiprt4 at acpi0: bus 3 (EXP2) > > acpiprt5 at acpi0: bus -1 (EXP3) > > acpiprt6 at acpi0: bus -1 (EXP6) > > acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS > > acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS > > acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS > > acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS > > acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2 > > acpipwrres1 at acpi0: NVP3, resource for PEG_ > > acpipwrres2 at acpi0: NVP2, resource for PEG_ > > acpitz0 at acpi0: critical temperature is 200 degC > > acpibtn0 at acpi0: LID_ > > acpibtn1 at acpi0: SLPB > > "LEN0071" at acpi0 not configured > > "LEN0036" at acpi0 not configured > > "SMO1200" at acpi0 not configured > > acpibat0 at acpi0: BAT0 model "45N1161" serial 3584 type LION oem "LGC" > > acpiac0 at acpi0: AC unit online > > acpithinkpad0 at acpi0 > > "PNP0C14" at acpi0 not configured > > "PNP0C14" at acpi0 not configured > > "PNP0C14" at acpi0 not configured > > "INT340F" at acpi0 not configured > > acpivideo0 at acpi0: VID_ > > acpivout at acpivideo0 not configured > > acpivideo1 at acpi0: VID_ > > cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2300, 2200, > > 2100, 2000, 1800, 1700, 1600, 1400, 1300, 1200, 1100, 900, 800 MHz > > pci0 at mainbus0 bus 0 > > pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06 > > inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06 > > drm0 at inteldrm0 > > inteldrm0: msi > > inteldrm0: 1920x1080, 32bpp > > wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation) > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi > > xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x04: msi > > usb0 at xhci0: USB revision 3.0 > > uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev > > 3.00/1.00 addr 1 > > "Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured > > em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi, address > > xx:xx:xx:xx:xx:xx > > ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 16 > > usb1 at ehci0: USB revision 2.0 > > uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev > > 2.00/1.00 addr 1 > > azalia1 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x04: msi > > azalia1: codecs: Realtek ALC292 > > audio0 at azalia1 > > ppb0 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd4: msi > > pci1 at ppb0 bus 2 > > rtsx0 at pci1 dev 0 function 0 "Realtek RTS5227 Card Reader" rev 0x01: msi > > sdmmc0 at rtsx0: 4-bit > > ppb1 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd4: msi > > pci2 at ppb1 bus 3 > > iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x83, > > msi > > ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 23 > > usb2 at ehci1: USB revision 2.0 > > uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev > > 2.00/1.00 addr 1 > > pcib0 at pci0 dev 31 function 0 "Intel QM87 LPC" rev 0x04 > > ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x04: msi, AHCI > > 1.3 > > ahci0: port 0: 6.0Gb/s > > ahci0: port 5: 1.5Gb/s > > scsibus1 at ahci0: 32 targets > > sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct > > fixed naa.5002538d41895ee0 > > sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin > > cd0 at scsibus1 targ 5 lun 0: <PLDS, DVD-RW DU8A5SH, BU51> ATAPI 5/cdrom > > removable > > ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x04: apic 2 > > int 18 > > iic0 at ichiic0 > > isa0 at pcib0 > > isadma0 at isa0 > > pckbc0 at isa0 port 0x60/5 irq 1 irq 12 > > pckbd0 at pckbc0 (kbd slot) > > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > > pms0 at pckbc0 (aux slot) > > wsmouse0 at pms0 mux 0 > > wsmouse1 at pms0 mux 0 > > pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300 > > pcppi0 at isa0 port 0x61 > > spkr0 at pcppi0 > > vmm0 at mainbus0: VMX/EPT > > error: [drm:pid0:intel_uncore_check_errors] *ERROR* Unclaimed register > > before interrupt > > umass0 at uhub0 port 2 configuration 1 interface 0 "SHARP Corporation > > 305SH" rev 2.00/2.28 addr 2 > > umass0: using SCSI over Bulk-Only > > scsibus2 at umass0: 2 targets, initiator 0 > > sd1 at scsibus2 targ 1 lun 0: <SHARP, 305SH microSD, 3.14> SCSI3 0/direct > > removable serial.04dd97d5598055430410 > > uhidev0 at uhub0 port 3 configuration 1 interface 0 "WiseGroup.,Ltd > > JC-PS101U" rev 1.00/2.88 addr 3 > > uhidev0: iclass 3/0 > > uhid0 at uhidev0: input=7, output=3, feature=0 > > uhidev1 at uhub0 port 6 configuration 1 interface 0 "Logitech USB Laser > > Mouse" rev 2.00/56.01 addr 4 > > uhidev1: iclass 3/1 > > ums0 at uhidev1: 8 buttons, Z and W dir > > wsmouse2 at ums0 mux 0 > > ugen0 at uhub0 port 7 "Validity Sensors VFS5011 Fingerprint Reader" rev > > 1.10/0.78 addr 5 > > ugen1 at uhub0 port 11 "Intel product 0x07dc" rev 2.00/0.01 addr 6 > > sdmmc0: can't enable card > > uvideo0 at uhub0 port 12 configuration 1 interface 0 "SunplusIT INC. > > Integrated Camera" rev 2.00/0.03 addr 7 > > video0 at uvideo0 > > umass1 at uhub0 port 16 configuration 1 interface 0 "Seagate Backup+ Desk" > > rev 3.00/3.42 addr 8 > > umass1: using SCSI over Bulk-Only > > scsibus3 at umass1: 2 targets, initiator 0 > > sd2 at scsibus3 targ 1 lun 0: <Seagate, Backup+ Desk, 0342> SCSI4 0/direct > > fixed > > sd2: 4769307MB, 4096 bytes/sector, 1220942645 sectors > > uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" > > rev 2.00/0.04 addr 2 > > uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" > > rev 2.00/0.04 addr 2 > > vscsi0 at root > > scsibus4 at vscsi0: 256 targets > > softraid0 at root > > scsibus5 at softraid0: 256 targets > > > > usbdevs: > > Controller /dev/usb0: > > addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), > > Intel(0x8086), rev 1.00 > > port 1 disabled > > port 2 disabled > > port 3 addr 2: low speed, power 100 mA, config 1, JC-PS101U(0x8888), > > WiseGroup.,Ltd(0x0925), rev 2.88 > > port 4 disabled > > port 5 disabled > > port 6 addr 3: low speed, power 98 mA, config 1, USB Laser Mouse(0xc069), > > Logitech(0x046d), rev 56.01 > > port 7 addr 4: full speed, power 100 mA, config 1, VFS5011 Fingerprint > > Reader(0x0017), Validity Sensors(0x138a), rev 0.78, iSerialNumber > > 7f178585b00e > > port 8 disabled > > port 9 disabled > > port 10 disabled > > port 11 addr 5: full speed, self powered, config 1, product > > 0x07dc(0x07dc), Intel(0x8087), rev 0.01 > > port 12 addr 6: high speed, power 500 mA, config 1, Integrated > > Camera(0x0268), SunplusIT INC.(0x5986), rev 0.03 > > port 13 disabled > > port 14 disabled > > port 15 disabled > > port 16 addr 7: super speed, self powered, config 1, Backup+ > > Desk(0xab31), Seagate(0x0bc2), rev 3.42, iSerialNumber NA7EA2SZ > > Controller /dev/usb1: > > addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), > > Intel(0x8086), rev 1.00 > > port 1 addr 2: high speed, self powered, config 1, Rate Matching > > Hub(0x8008), Intel(0x8087), rev 0.04 > > port 1 powered > > port 2 powered > > port 3 powered > > port 4 powered > > port 5 powered > > port 6 powered > > port 2 powered > > port 3 powered > > Controller /dev/usb2: > > addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), > > Intel(0x8086), rev 1.00 > > port 1 addr 2: high speed, self powered, config 1, Rate Matching > > Hub(0x8000), Intel(0x8087), rev 0.04 > > port 1 powered > > port 2 powered > > port 3 powered > > port 4 powered > > port 5 powered > > port 6 powered > > port 7 powered > > port 8 powered > > port 2 powered > > port 3 powered > > >