Greetings,

I recently installed OpenBSD 7.0 on an old CoreDuo2 machine (Compaq 610, complete dmesg in attach), which was powered by 5.5/5.6/5.7 some years ago, without any relevant issues (after that, it has been used as home server with Debian).

mskc0 at pci4 dev 0 function 0 "Marvell Yukon 88E8042" rev 0x10, Yukon-2 FE+ 
rev. A0 (0x0): msi
msk0 at mskc0 port A: address 18:a9:05:94:ab:19
eephy0 at msk0 phy 0: 88E3016 10/100 PHY, rev. 0

I noticed that the trunk(4) failover protocol is broken when the Ethernet cable is plugged in (starting in this configuration, no lease is acquired from DHCP server, switching to Ethernet from wifi breaks the connection; in both cases, trunk and msk0 status is: no carrier).

It's worth noting that when msk0 is configured as "stand-alone" (i.e., without trunk(4) failover), the connection is pretty functional and stable.

Since I didn't remember any similar problems showing up with 5.x, I made a bit of bisecting, and my conclusion is that the functionality got broken b/w 6.2 and 6.3 and, specifically, after the following commit:

RCS file: /cvs/src/sys/dev/pci/if_msk.c,v
----------------------------
revision 1.131
date: 2018/01/06 03:11:04;  author: dlg;  state: Exp;  lines: +251 -311;  
commitid: BhB8LisF92o4xfOK;
rework the transmit and receive paths to address reliability issues.

phessler@ has been having trouble with msk on overdrive 1000s. some
of the issues relate to the driver not coping with exhaustion of
mbufs for the rx ring, the other issues are corruption of the mcl9k
pool that msk uses.

this diff adds a timeout that the rx refill code uses when the rx
ring is empty and cannot be filled. it'll periodically retry the
ring refill until it can get some mbufs in the air again.

the current code made hunting for the mcl9k issue too hard, so this
rewrites it to be simpler and more like other drivers. there's now
just arrays of mbuf pointers and dmamaps to shadow the hardware
ring entries, and producer and consumer indexes. what was there
before had linkes lists of something to hold mbuf pointers and
dmamaps, and some way to go from the ring to go back to that. i
think, it was hard to tell what was happening.

this also copies the ADDR64 handling on the tx ring to the rx ring.
this potentially makes more rx descriptors available, but that can
happen later.

in hindsight the mcl9k problem could have been from letting if_rxr
allocate the entier ring. if every descriptor was filled, the chip
may have run around the ring when it shouldnt have. giving rxr one
less descriptor than there is on the ring may have fixed the problem
too.

this work also makes it easier to make msk mpsafe.

tested by an ok phessler@
ok kettenis@ deraadt@
=============================================================================

and the corresponding one for sys/dev/pci/if_mskvar.h (revision 1.14, same log).

On a fresh 6.3 install, which was showing the issue, I reverted the 2 files to the revisions 1.130 and 1.13 respectively, observing a functional trunk(4) failover again.

The diff is too long and complex, so I cannot say where the problem lies exactly, but I hope this report contains enough information to start an analysis (I'm copying the involved developers, just in case they are not reading this list); of course, I'm available to test any patches (on 7.0 or -current) and add further details if needed.

Please note that the dmesg is from OBSD 6.3, since that is the version currently installed on the laptop; in case you're interested in the 7.0/current's dmesg, just let me know.

All the best

--
Alessandro De Laurenzis
[mailto:[email protected]]
Web: http://www.atlantide.mooo.com
LinkedIn: http://it.linkedin.com/in/delaurenzis
OpenBSD 6.3-stable (GENERIC.MP) #0: Tue Dec  7 18:09:08 CET 2021
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2121990144 (2023MB)
avail mem = 2050654208 (1955MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf284b (25 entries)
bios0: vendor Hewlett-Packard version "68PVU Ver. F.08" date 09/24/2009
bios0: Hewlett-Packard Compaq 610
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC HPET APIC MCFG TCPA SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices C0B6(S5) C10E(S3) C115(S3) C116(S3) C117(S3) C121(S3) 
C123(S3) C139(S5) C2AB(S5) C13C(S5) C2AC(S5) C13D(S0) C13F(S0) C247(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz, 2194.87 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,LONG,LAHF,PERF,SENSOR,MELTDOWN
cpu0: 2MB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz, 1995.01 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,LONG,LAHF,PERF,SENSOR,MELTDOWN
cpu1: 2MB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 20, 24 pins
, remapped to apid 1
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 2 (C0B6)
acpiprt1 at acpi0: bus 8 (C125)
acpiprt2 at acpi0: bus 16 (C139)
acpiprt3 at acpi0: bus 40 (C13C)
acpiprt4 at acpi0: bus 48 (C13D)
acpiprt5 at acpi0: bus 0 (C003)
acpiec0 at acpi0
acpicpu0 at acpi0: !C3(250@17 mwait.3@0x20), !C2(500@1 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu1 at acpi0: !C3(250@17 mwait.3@0x20), !C2(500@1 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpipwrres0 at acpi0: C27C, resource for C277
acpipwrres1 at acpi0: C289, resource for C27D
acpipwrres2 at acpi0: C2A5, resource for C2A3
acpipwrres3 at acpi0: C1CE, resource for C13E
acpipwrres4 at acpi0: C3C1, resource for C3C6
acpipwrres5 at acpi0: C3C2, resource for C3C7
acpipwrres6 at acpi0: C3C3, resource for C3C8
acpipwrres7 at acpi0: C3C4, resource for C3C9
acpipwrres8 at acpi0: C3C5, resource for C3CA
acpitz0 at acpi0: critical temperature is 105 degC
acpitz1 at acpi0: critical temperature is 107 degC
acpitz2 at acpi0: critical temperature is 110 degC
acpitz3 at acpi0: critical temperature is 256 degC
acpitz4 at acpi0: critical temperature is 107 degC
"PNP0A06" at acpi0 not configured
"SYN0159" at acpi0 not configured
"HPQ0006" at acpi0 not configured
acpibat0 at acpi0: C245 model "Primary" serial 00588 2020/10/31 type LIon oem 
"Hewlett-Packard"
acpiac0 at acpi0: AC unit online
acpibtn0 at acpi0: C2BE
acpibtn1 at acpi0: C15B
"PNP0C32" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
acpivideo0 at acpi0: C09E
acpivout0 at acpivideo0: C1B5
cpu0: Enhanced SpeedStep 2194 MHz: speeds: 2001, 2000, 1600, 1200, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel GME965 Host" rev 0x0c
inteldrm0 at pci0 dev 2 function 0 "Intel GME965 Video" rev 0x0c
drm0 at inteldrm0
intagp0 at inteldrm0
agp0 at intagp0: aperture at 0xd0000000, size 0x10000000
inteldrm0: msi
inteldrm0: 848x480, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel GME965 Video" rev 0x0c at pci0 dev 2 function 1 not configured
uhci0 at pci0 dev 26 function 0 "Intel 82801H USB" rev 0x03: apic 1 int 16
ehci0 at pci0 dev 26 function 7 "Intel 82801H USB" rev 0x03: apic 1 int 18
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia0 at pci0 dev 27 function 0 "Intel 82801H HD Audio" rev 0x03: msi
azalia0: codecs: IDT 92HD75B1/2, AT&T/Lucent/0x1040, using IDT 92HD75B1/2
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 82801H PCIE" rev 0x03
pci1 at ppb0 bus 8
ppb1 at pci0 dev 28 function 1 "Intel 82801H PCIE" rev 0x03: msi
pci2 at ppb1 bus 16
iwn0 at pci2 dev 0 function 0 "Intel WiFi Link 5100" rev 0x00: msi, MIMO 1T2R, 
MoW, address 00:26:c6:01:58:12
ppb2 at pci0 dev 28 function 4 "Intel 82801H PCIE" rev 0x03: msi
pci3 at ppb2 bus 40
ppb3 at pci0 dev 28 function 5 "Intel 82801H PCIE" rev 0x03
pci4 at ppb3 bus 48
mskc0 at pci4 dev 0 function 0 "Marvell Yukon 88E8042" rev 0x10, Yukon-2 FE+ 
rev. A0 (0x0): msi
msk0 at mskc0 port A: address 18:a9:05:94:ab:19
eephy0 at msk0 phy 0: 88E3016 10/100 PHY, rev. 0
uhci1 at pci0 dev 29 function 0 "Intel 82801H USB" rev 0x03: apic 1 int 20
uhci2 at pci0 dev 29 function 1 "Intel 82801H USB" rev 0x03: apic 1 int 21
uhci3 at pci0 dev 29 function 2 "Intel 82801H USB" rev 0x03: apic 1 int 18
ehci1 at pci0 dev 29 function 7 "Intel 82801H USB" rev 0x03: apic 1 int 20
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
ppb4 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0xf3
pci5 at ppb4 bus 2
pcib0 at pci0 dev 31 function 0 "Intel 82801HBM LPC" rev 0x03
pciide0 at pci0 dev 31 function 1 "Intel 82801HBM IDE" rev 0x03: DMA, channel 0 
configured to compatibility, channel 1 configured to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 ignored (disabled)
ahci0 at pci0 dev 31 function 2 "Intel 82801HBM AHCI" rev 0x03: msi, AHCI 1.1
ahci0: port 0: 3.0Gb/s
ahci0: port 2: 1.5Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Hitachi HTS72323, FC4O> SCSI3 0/direct 
fixed naa.5000cca582d98f33
sd0: 305245MB, 512 bytes/sector, 625142448 sectors
cd0 at scsibus1 targ 2 lun 0: <hp, CDDVDW TS-L633M, 0301> ATAPI 5/cdrom 
removable
usb2 at uhci0: USB revision 1.0
uhub2 at usb2 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 
addr 1
usb3 at uhci1: USB revision 1.0
uhub3 at usb3 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 
addr 1
usb4 at uhci2: USB revision 1.0
uhub4 at usb4 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 
addr 1
usb5 at uhci3: USB revision 1.0
uhub5 at usb5 configuration 1 interface 0 "Intel UHCI root hub" rev 1.00/1.00 
addr 1
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pms0: Synaptics touchpad, firmware 7.2, 0x1c0b1 0xa40000
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vblank wait timed out on crtc 1
vblank wait timed out on crtc 1
vblank wait timed out on crtc 1
vblank wait timed out on crtc 1
vblank wait timed out on crtc 1
uvideo0 at uhub1 port 5 configuration 1 interface 0 "Chicony Electronics Co., 
Ltd. CNF8243" rev 2.00/85.39 addr 2
video0 at uvideo0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (ebe6de906ed996dc.a) swap on sd0b dump on sd0b

Reply via email to