>Synopsis:      em0 loses connectivity due to low mbufs
>Category:      system
>Environment:
        System      : OpenBSD 6.2
        Details     : OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) 
#22: Wed Aug 30 19:23:17 JST 2017
                         
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:

I've been seeing random drops in network connectivity that I've
traced to what appears to be not enough mbufs being used.  The
issue is seen on the following em0 controller in a Thinkpad T440p:

        em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi

A freshly booted system will work fine, but heavily using the
network (for example, by transferring large files over a LAN) will
cause connectivity to drop sooner rather than later.  But even
only lightly using the network will eventually cause the issue to
surface.

Rebooting always fixes the issue.  Issuing a "zzz" command will
also fix it most of the time, but not always.  Some of the time, a
simple "ifconfig em0 down up" will fix it, but usually only once
or twice.  After that, the system must either be zzz'ed or
rebooted.

I've attached "systat mbuf" output for various states of "working"
vs. "not working".

Note how the ALIVE value drops below the LWM value when it's not
working.

FULL DISCLOSURE: I am running a kernel with the
PPPOE_TERM_UNKNOWN_SESSIONS option set.  I do not believe it
would affect the em0 driver, but I suppose it's possible that it
could.

vvvvvvvvvvvvvvvvvvv WORKING vvvvvvvvvvvvvvvvvvvvv

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                    0   256   418          30
                             2048    17           9
                             2112    15           5
                             4096   256          37
lo0
em0                          2050    15    10   256    15
iwm0
enc0
pppoe0
pflog0




vvvvvvv NON-WORKING (immediately after connectivity is lost) vvvvvvvvv
(Note that ALIVE is lower than LWM and CWM is currently 145)

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                    0   256  1509         160
                             2048    31          29
                             2112  1036          72
                             4096   256          41
lo0
em0                          2050     2    10   256   145
iwm0
enc0
pppoe0
pflog0


vvvvvvvvvvvvvvv NON-WORKING (after a minute or so) vvvvvvvvvvvvvvvvv
(Note how the CWM has steadily increased to 256 over the last
minute)

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                    0   256  1656         160
                             2048    32          29
                             2112  1179          81
                             4096   256          41
lo0
em0                          2050     2    10   256   256
iwm0
enc0
pppoe0
pflog0


vvvvvvvvvvvvvvv NON-WORKING (after "ifconfig em0 down up") vvvvvvvvvvvv
(In this instance, "ifconfig em0 down up" didn't work, but it did
reset CWM back to LWM)

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                    0   256  1711         160
                             2048    43          29
                             2112  1191          82
                             4096   256          41
lo0
em0                          2050     1    10   256    10
iwm0
enc0
pppoe0
pflog0


vvvvvvvvvvv NON-WORKING (after repeated "ifconfig em0 down up") vvvvvvvv
(Now there is nothing reported for em0 at all.  After getting into
this state, dmesg showed the following lines:

em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
em0: unable to fill any rx descriptors
)

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                    0   256  1742         160
                             2048    54          29
                             2112  1215          84
                             4096   256          41
lo0
em0                          
iwm0
enc0
pppoe0
pflog0
-----------------------------------------------

I am willing to provide any additional needed information, as well
as test any potential patches.  Please let me know if I can
provide any additional details.


>How-To-Repeat:
        Saturate the network connection.  Eventually, the system
        will stop receiving network data.
>Fix:
        Temporary fix:  Either issue "ifconfig em0 down up", "zzz",
        or reboot.

        Permanent fix:  Unknown.  I attempted to revert, in turn,
        the if_em* files all the way up to a Jan 23rd commit to
        see if it was due to any recent commits there, but the
        kernel panicked upon booting when the if_em* files were
        reverted to that point.  I think there has been too much
        progress in the rest of the system to sucessfully revert
        to such a long time ago.


dmesg:
OpenBSD 6.2-beta (GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS) #22: Wed Aug 30 
19:23:17 JST 2017
    
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP-PPPOE_TERM_UNKNOWN_SESSIONS
real mem = 12539871232 (11958MB)
avail mem = 12152803328 (11589MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbcc0d000 (67 entries)
bios0: vendor LENOVO version "GLET85WW (2.39 )" date 09/29/2016
bios0: LENOVO 20AWS27D00
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT SSDT 
SSDT SSDT PCCT SSDT TCPA UEFI MSDM ASF! BATB FPDT UEFI DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) EXP3(S4) XHCI(S3) 
EHC1(S3) EHC2(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiec0 at acpi0
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2594.37 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: TSC frequency 2594368320 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2593.99 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG_)
acpiprt3 at acpi0: bus 2 (EXP1)
acpiprt4 at acpi0: bus 3 (EXP2)
acpiprt5 at acpi0: bus -1 (EXP3)
acpiprt6 at acpi0: bus -1 (EXP6)
acpicpu0 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C2(200@148 mwait.1@0x33), C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1, EHC2
acpipwrres1 at acpi0: NVP3, resource for PEG_
acpipwrres2 at acpi0: NVP2, resource for PEG_
acpitz0 at acpi0: critical temperature is 200 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
"LEN0071" at acpi0 not configured
"LEN0036" at acpi0 not configured
"SMO1200" at acpi0 not configured
acpibat0 at acpi0: BAT0 model "45N1161" serial  3584 type LION oem "LGC"
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT340F" at acpi0 not configured
acpivideo0 at acpi0: VID_
acpivout at acpivideo0 not configured
acpivideo1 at acpi0: VID_
cpu0: Enhanced SpeedStep 2594 MHz: speeds: 2601, 2600, 2500, 2300, 2200, 2100, 
2000, 1800, 1700, 1600, 1400, 1300, 1200, 1100, 900, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1920x1080, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 4G HD Audio" rev 0x06: msi
xhci0 at pci0 dev 20 function 0 "Intel 8 Series xHCI" rev 0x04: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
addr 1
"Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x04: msi, address 
xx:xx:xx:xx:xx:xx
ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 16
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia1 at pci0 dev 27 function 0 "Intel 8 Series HD Audio" rev 0x04: msi
azalia1: codecs: Realtek ALC292
audio0 at azalia1
ppb0 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd4: msi
pci1 at ppb0 bus 2
rtsx0 at pci1 dev 0 function 0 "Realtek RTS5227 Card Reader" rev 0x01: msi
sdmmc0 at rtsx0: 4-bit
ppb1 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd4: msi
pci2 at ppb1 bus 3
iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x83, msi
ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x04: apic 2 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
pcib0 at pci0 dev 31 function 0 "Intel QM87 LPC" rev 0x04
ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x04: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
ahci0: port 5: 1.5Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Samsung SSD 850, EMT0> SCSI3 0/direct fixed 
naa.5002538d41895ee0
sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
cd0 at scsibus1 targ 5 lun 0: <PLDS, DVD-RW DU8A5SH, BU51> ATAPI 5/cdrom 
removable
ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x04: apic 2 int 18
iic0 at ichiic0
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.2, 0x1e2b1 0x943300
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
error: [drm:pid0:intel_uncore_check_errors] *ERROR* Unclaimed register before 
interrupt
umass0 at uhub0 port 2 configuration 1 interface 0 "SHARP Corporation 305SH" 
rev 2.00/2.28 addr 2
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, initiator 0
sd1 at scsibus2 targ 1 lun 0: <SHARP, 305SH microSD, 3.14> SCSI3 0/direct 
removable serial.04dd97d5598055430410
uhidev0 at uhub0 port 3 configuration 1 interface 0 "WiseGroup.,Ltd JC-PS101U" 
rev 1.00/2.88 addr 3
uhidev0: iclass 3/0
uhid0 at uhidev0: input=7, output=3, feature=0
uhidev1 at uhub0 port 6 configuration 1 interface 0 "Logitech USB Laser Mouse" 
rev 2.00/56.01 addr 4
uhidev1: iclass 3/1
ums0 at uhidev1: 8 buttons, Z and W dir
wsmouse2 at ums0 mux 0
ugen0 at uhub0 port 7 "Validity Sensors VFS5011 Fingerprint Reader" rev 
1.10/0.78 addr 5
ugen1 at uhub0 port 11 "Intel product 0x07dc" rev 2.00/0.01 addr 6
sdmmc0: can't enable card
uvideo0 at uhub0 port 12 configuration 1 interface 0 "SunplusIT INC. Integrated 
Camera" rev 2.00/0.03 addr 7
video0 at uvideo0
umass1 at uhub0 port 16 configuration 1 interface 0 "Seagate Backup+  Desk" rev 
3.00/3.42 addr 8
umass1: using SCSI over Bulk-Only
scsibus3 at umass1: 2 targets, initiator 0
sd2 at scsibus3 targ 1 lun 0: <Seagate, Backup+ Desk, 0342> SCSI4 0/direct fixed
sd2: 4769307MB, 4096 bytes/sector, 1220942645 sectors
uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 
2.00/0.04 addr 2
uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 
2.00/0.04 addr 2
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets

usbdevs:
Controller /dev/usb0:
addr 1: super speed, self powered, config 1, xHCI root hub(0x0000), 
Intel(0x8086), rev 1.00
 port 1 disabled
 port 2 disabled
 port 3 addr 2: low speed, power 100 mA, config 1, JC-PS101U(0x8888), 
WiseGroup.,Ltd(0x0925), rev 2.88
 port 4 disabled
 port 5 disabled
 port 6 addr 3: low speed, power 98 mA, config 1, USB Laser Mouse(0xc069), 
Logitech(0x046d), rev 56.01
 port 7 addr 4: full speed, power 100 mA, config 1, VFS5011 Fingerprint 
Reader(0x0017), Validity Sensors(0x138a), rev 0.78, iSerialNumber 7f178585b00e
 port 8 disabled
 port 9 disabled
 port 10 disabled
 port 11 addr 5: full speed, self powered, config 1, product 0x07dc(0x07dc), 
Intel(0x8087), rev 0.01
 port 12 addr 6: high speed, power 500 mA, config 1, Integrated Camera(0x0268), 
SunplusIT INC.(0x5986), rev 0.03
 port 13 disabled
 port 14 disabled
 port 15 disabled
 port 16 addr 7: super speed, self powered, config 1, Backup+  Desk(0xab31), 
Seagate(0x0bc2), rev 3.42, iSerialNumber NA7EA2SZ
Controller /dev/usb1:
addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), 
Intel(0x8086), rev 1.00
 port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x8008), 
Intel(0x8087), rev 0.04
  port 1 powered
  port 2 powered
  port 3 powered
  port 4 powered
  port 5 powered
  port 6 powered
 port 2 powered
 port 3 powered
Controller /dev/usb2:
addr 1: high speed, self powered, config 1, EHCI root hub(0x0000), 
Intel(0x8086), rev 1.00
 port 1 addr 2: high speed, self powered, config 1, Rate Matching Hub(0x8000), 
Intel(0x8087), rev 0.04
  port 1 powered
  port 2 powered
  port 3 powered
  port 4 powered
  port 5 powered
  port 6 powered
  port 7 powered
  port 8 powered
 port 2 powered
 port 3 powered

Reply via email to