FWIW, I get the same behavior on GENERIC.MP, so I don't think the 
PPPOE_TERM_UNKNOWN_SESSIONS kernel option is causing this.

If I can provide any more information, please let me know.

-- 
Bryan

On 2017-09-03 05:59:51, Bryan Linton <[email protected]> wrote:
> >Synopsis:    em0 loses connectivity due to low mbufs
> >Category:    system
> >Environment:
>       System      : OpenBSD 6.2
>       Details     : OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) 
> #22: Wed Aug 30 19:23:17 JST 2017
>                        
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
> >Description:
> 
> I've been seeing random drops in network connectivity that I've
> traced to what appears to be not enough mbufs being used.  The
> issue is seen on the following em0 controller in a Thinkpad T440p:
> 
>       em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi
> 
> A freshly booted system will work fine, but heavily using the
> network (for example, by transferring large files over a LAN) will
> cause connectivity to drop sooner rather than later.  But even
> only lightly using the network will eventually cause the issue to
> surface.
> 
> Rebooting always fixes the issue.  Issuing a "zzz" command will
> also fix it most of the time, but not always.  Some of the time, a
> simple "ifconfig em0 down up" will fix it, but usually only once
> or twice.  After that, the system must either be zzz'ed or
> rebooted.
> 
> I've attached "systat mbuf" output for various states of "working"
> vs. "not working".
> 
> Note how the ALIVE value drops below the LWM value when it's not
> working.
> 
> FULL DISCLOSURE: I am running a kernel with the
> PPPOE_TERM_UNKNOWN_SESSIONS option set.  I do not believe it
> would affect the em0 driver, but I suppose it's possible that it
> could.
> 
> vvvvvvvvvvvvvvvvvvv WORKING vvvvvvvvvvvvvvvvvvvvv
> 
> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
> System                    0   256   418          30
>                              2048    17           9
>                              2112    15           5
>                              4096   256          37
> lo0
> em0                          2050    15    10   256    15
> iwm0
> enc0
> pppoe0
> pflog0
> 
> 
> 
> 
> vvvvvvv NON-WORKING (immediately after connectivity is lost) vvvvvvvvv
> (Note that ALIVE is lower than LWM and CWM is currently 145)
> 
> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
> System                    0   256  1509         160
>                              2048    31          29
>                              2112  1036          72
>                              4096   256          41
> lo0
> em0                          2050     2    10   256   145
> iwm0
> enc0
> pppoe0
> pflog0
> 
> 
> vvvvvvvvvvvvvvv NON-WORKING (after a minute or so) vvvvvvvvvvvvvvvvv
> (Note how the CWM has steadily increased to 256 over the last
> minute)
> 
> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
> System                    0   256  1656         160
>                              2048    32          29
>                              2112  1179          81
>                              4096   256          41
> lo0
> em0                          2050     2    10   256   256
> iwm0
> enc0
> pppoe0
> pflog0
> 
> 
> vvvvvvvvvvvvvvv NON-WORKING (after "ifconfig em0 down up") vvvvvvvvvvvv
> (In this instance, "ifconfig em0 down up" didn't work, but it did
> reset CWM back to LWM)
> 
> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
> System                    0   256  1711         160
>                              2048    43          29
>                              2112  1191          82
>                              4096   256          41
> lo0
> em0                          2050     1    10   256    10
> iwm0
> enc0
> pppoe0
> pflog0
> 
> 
> vvvvvvvvvvv NON-WORKING (after repeated "ifconfig em0 down up") vvvvvvvv
> (Now there is nothing reported for em0 at all.  After getting into
> this state, dmesg showed the following lines:
> 
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> em0: unable to fill any rx descriptors
> )
> 
> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
> System                    0   256  1742         160
>                              2048    54          29
>                              2112  1215          84
>                              4096   256          41
> lo0
> em0                          
> iwm0
> enc0
> pppoe0
> pflog0
> -----------------------------------------------
> 
> I am willing to provide any additional needed information, as well
> as test any potential patches.  Please let me know if I can
> provide any additional details.
> 
> 
> >How-To-Repeat:
>       Saturate the network connection.  Eventually, the system
>       will stop receiving network data.
> >Fix:
>       Temporary fix:  Either issue "ifconfig em0 down up", "zzz",
>       or reboot.
> 
>       Permanent fix:  Unknown.  I attempted to revert, in turn,
>       the if_em* files all the way up to a Jan 23rd commit to
>       see if it was due to any recent commits there, but the
>       kernel panicked upon booting when the if_em* files were
>       reverted to that point.  I think there has been too much
>       progress in the rest of the system to sucessfully revert
>       to such a long time ago.
> 
> 
> dmesg:
> OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) #22: Wed Aug 30 
> 19:23:17 JST 2017
>     
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS
> real mem = 12539871232 (11958MB)
> avail mem = 12152803328 (11589MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbcc0d000 (67 entries)
> bios0: vendor LENOVO version "GLET85WW (2.39 )" date 09/29/2016
> bios0: LENOVO 20AWS27D00
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT 
> SSDT SSDT SSDT PCCT SSDT TCPA UEFI MSDM ASF! BATB FPDT UEFI DMAR
> acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) EXP3(S4) XHCI(S3) 
> EHC1(S3) EHC2(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpiec0 at acpi0
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2594.37 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: TSC frequency 2594368320 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
> acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (PEG0)
> acpiprt2 at acpi0: bus -1 (PEG_)
> acpiprt3 at acpi0: bus 2 (EXP1)
> acpiprt4 at acpi0: bus 3 (EXP2)
> acpiprt5 at acpi0: bus -1 (EXP3)
> acpiprt6 at acpi0: bus -1 (EXP6)
> acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
> acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2
> acpipwrres1 at acpi0: NVP3, resource for PEG_
> acpipwrres2 at acpi0: NVP2, resource for PEG_
> acpitz0 at acpi0: critical temperature is 200 degC
> acpibtn0 at acpi0: LID_
> acpibtn1 at acpi0: SLPB
> "LEN0071" at acpi0 not configured
> "LEN0036" at acpi0 not configured
> "SMO1200" at acpi0 not configured
> acpibat0 at acpi0: BAT0 model "45N1161" serial  3584 type LION oem "LGC"
> acpiac0 at acpi0: AC unit online
> acpithinkpad0 at acpi0
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "INT340F" at acpi0 not configured
> acpivideo0 at acpi0: VID_
> acpivout at acpivideo0 not configured
> acpivideo1 at acpi0: VID_
> cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2300, 2200, 
> 2100, 2000, 1800, 1700, 1600, 1400, 1300, 1200, 1100, 900, 800 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
> inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06
> drm0 at inteldrm0
> inteldrm0: msi
> inteldrm0: 1920x1080, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi
> xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x04: msi
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> "Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
> em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi, address 
> xx:xx:xx:xx:xx:xx
> ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 16
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
> addr 1
> azalia1 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x04: msi
> azalia1: codecs: Realtek ALC292
> audio0 at azalia1
> ppb0 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd4: msi
> pci1 at ppb0 bus 2
> rtsx0 at pci1 dev 0 function 0 "Realtek RTS5227 Card Reader" rev 0x01: msi
> sdmmc0 at rtsx0: 4-bit
> ppb1 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd4: msi
> pci2 at ppb1 bus 3
> iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x83, msi
> ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 23
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
> addr 1
> pcib0 at pci0 dev 31 function 0 "Intel QM87 LPC" rev 0x04
> ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x04: msi, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> ahci0: port 5: 1.5Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct 
> fixed naa.5002538d41895ee0
> sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
> cd0 at scsibus1 targ 5 lun 0: <PLDS, DVD-RW DU8A5SH, BU51> ATAPI 5/cdrom 
> removable
> ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x04: apic 2 int 
> 18
> iic0 at ichiic0
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> wsmouse1 at pms0 mux 0
> pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> vmm0 at mainbus0: VMX/EPT
> error: [drm:pid0:intel_uncore_check_errors] *ERROR* Unclaimed register before 
> interrupt
> umass0 at uhub0 port 2 configuration 1 interface 0 "SHARP Corporation 305SH" 
> rev 2.00/2.28 addr 2
> umass0: using SCSI over Bulk-Only
> scsibus2 at umass0: 2 targets, initiator 0
> sd1 at scsibus2 targ 1 lun 0: <SHARP, 305SH microSD, 3.14> SCSI3 0/direct 
> removable serial.04dd97d5598055430410
> uhidev0 at uhub0 port 3 configuration 1 interface 0 "WiseGroup.,Ltd 
> JC-PS101U" rev 1.00/2.88 addr 3
> uhidev0: iclass 3/0
> uhid0 at uhidev0: input=7, output=3, feature=0
> uhidev1 at uhub0 port 6 configuration 1 interface 0 "Logitech USB Laser 
> Mouse" rev 2.00/56.01 addr 4
> uhidev1: iclass 3/1
> ums0 at uhidev1: 8 buttons, Z and W dir
> wsmouse2 at ums0 mux 0
> ugen0 at uhub0 port 7 "Validity Sensors VFS5011 Fingerprint Reader" rev 
> 1.10/0.78 addr 5
> ugen1 at uhub0 port 11 "Intel product 0x07dc" rev 2.00/0.01 addr 6
> sdmmc0: can't enable card
> uvideo0 at uhub0 port 12 configuration 1 interface 0 "SunplusIT INC. 
> Integrated Camera" rev 2.00/0.03 addr 7
> video0 at uvideo0
> umass1 at uhub0 port 16 configuration 1 interface 0 "Seagate Backup+  Desk" 
> rev 3.00/3.42 addr 8
> umass1: using SCSI over Bulk-Only
> scsibus3 at umass1: 2 targets, initiator 0
> sd2 at scsibus3 targ 1 lun 0: <Seagate, Backup+ Desk, 0342> SCSI4 0/direct 
> fixed
> sd2: 4769307MB, 4096 bytes/sector, 1220942645 sectors
> uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" 
> rev 2.00/0.04 addr 2
> uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" 
> rev 2.00/0.04 addr 2
> vscsi0 at root
> scsibus4 at vscsi0: 256 targets
> softraid0 at root
> scsibus5 at softraid0: 256 targets
> 
> usbdevs:
> Controller /dev/usb0:
> addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), 
> Intel(0x8086), rev 1.00
>  port 1 disabled
>  port 2 disabled
>  port 3 addr 2: low speed, power 100 mA, config 1, JC-PS101U(0x8888), 
> WiseGroup.,Ltd(0x0925), rev 2.88
>  port 4 disabled
>  port 5 disabled
>  port 6 addr 3: low speed, power 98 mA, config 1, USB Laser Mouse(0xc069), 
> Logitech(0x046d), rev 56.01
>  port 7 addr 4: full speed, power 100 mA, config 1, VFS5011 Fingerprint 
> Reader(0x0017), Validity Sensors(0x138a), rev 0.78, iSerialNumber 7f178585b00e
>  port 8 disabled
>  port 9 disabled
>  port 10 disabled
>  port 11 addr 5: full speed, self powered, config 1, product 0x07dc(0x07dc), 
> Intel(0x8087), rev 0.01
>  port 12 addr 6: high speed, power 500 mA, config 1, Integrated 
> Camera(0x0268), SunplusIT INC.(0x5986), rev 0.03
>  port 13 disabled
>  port 14 disabled
>  port 15 disabled
>  port 16 addr 7: super speed, self powered, config 1, Backup+  Desk(0xab31), 
> Seagate(0x0bc2), rev 3.42, iSerialNumber NA7EA2SZ
> Controller /dev/usb1:
> addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), 
> Intel(0x8086), rev 1.00
>  port 1 addr 2: high speed, self powered, config 1, Rate Matching 
> Hub(0x8008), Intel(0x8087), rev 0.04
>   port 1 powered
>   port 2 powered
>   port 3 powered
>   port 4 powered
>   port 5 powered
>   port 6 powered
>  port 2 powered
>  port 3 powered
> Controller /dev/usb2:
> addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), 
> Intel(0x8086), rev 1.00
>  port 1 addr 2: high speed, self powered, config 1, Rate Matching 
> Hub(0x8000), Intel(0x8087), rev 0.04
>   port 1 powered
>   port 2 powered
>   port 3 powered
>   port 4 powered
>   port 5 powered
>   port 6 powered
>   port 7 powered
>   port 8 powered
>  port 2 powered
>  port 3 powered
> 

Reply via email to