Hi Alessandro,

Did you bisect the whole kernel or just msk(4) changes?

Cheers,
dlg

> On 8 Dec 2021, at 04:39, Alessandro De Laurenzis <[email protected]> 
> wrote:
> 
> Greetings,
> 
> I recently installed OpenBSD 7.0 on an old CoreDuo2 machine (Compaq 610, 
> complete dmesg in attach), which was powered by 5.5/5.6/5.7 some years ago, 
> without any relevant issues (after that, it has been used as home server with 
> Debian).
> 
>> mskc0 at pci4 dev 0 function 0 "Marvell Yukon 88E8042" rev 0x10, Yukon-2 FE+ 
>> rev. A0 (0x0): msi
>> msk0 at mskc0 port A: address 18:a9:05:94:ab:19
>> eephy0 at msk0 phy 0: 88E3016 10/100 PHY, rev. 0
> 
> I noticed that the trunk(4) failover protocol is broken when the Ethernet 
> cable is plugged in (starting in this configuration, no lease is acquired 
> from DHCP server, switching to Ethernet from wifi breaks the connection; in 
> both cases, trunk and msk0 status is: no carrier).
> 
> It's worth noting that when msk0 is configured as "stand-alone" (i.e., 
> without trunk(4) failover), the connection is pretty functional and stable.
> 
> Since I didn't remember any similar problems showing up with 5.x, I made a 
> bit of bisecting, and my conclusion is that the functionality got broken b/w 
> 6.2 and 6.3 and, specifically, after the following commit:
> 
>> RCS file: /cvs/src/sys/dev/pci/if_msk.c,v
>> ----------------------------
>> revision 1.131
>> date: 2018/01/06 03:11:04;  author: dlg;  state: Exp;  lines: +251 -311;  
>> commitid: BhB8LisF92o4xfOK;
>> rework the transmit and receive paths to address reliability issues.
>> phessler@ has been having trouble with msk on overdrive 1000s. some
>> of the issues relate to the driver not coping with exhaustion of
>> mbufs for the rx ring, the other issues are corruption of the mcl9k
>> pool that msk uses.
>> this diff adds a timeout that the rx refill code uses when the rx
>> ring is empty and cannot be filled. it'll periodically retry the
>> ring refill until it can get some mbufs in the air again.
>> the current code made hunting for the mcl9k issue too hard, so this
>> rewrites it to be simpler and more like other drivers. there's now
>> just arrays of mbuf pointers and dmamaps to shadow the hardware
>> ring entries, and producer and consumer indexes. what was there
>> before had linkes lists of something to hold mbuf pointers and
>> dmamaps, and some way to go from the ring to go back to that. i
>> think, it was hard to tell what was happening.
>> this also copies the ADDR64 handling on the tx ring to the rx ring.
>> this potentially makes more rx descriptors available, but that can
>> happen later.
>> in hindsight the mcl9k problem could have been from letting if_rxr
>> allocate the entier ring. if every descriptor was filled, the chip
>> may have run around the ring when it shouldnt have. giving rxr one
>> less descriptor than there is on the ring may have fixed the problem
>> too.
>> this work also makes it easier to make msk mpsafe.
>> tested by an ok phessler@
>> ok kettenis@ deraadt@
>> =============================================================================
> 
> and the corresponding one for sys/dev/pci/if_mskvar.h (revision 1.14, same 
> log).
> 
> On a fresh 6.3 install, which was showing the issue, I reverted the 2 files 
> to the revisions 1.130 and 1.13 respectively, observing a functional trunk(4) 
> failover again.
> 
> The diff is too long and complex, so I cannot say where the problem lies 
> exactly, but I hope this report contains enough information to start an 
> analysis (I'm copying the involved developers, just in case they are not 
> reading this list); of course, I'm available to test any patches (on 7.0 or 
> -current) and add further details if needed.
> 
> Please note that the dmesg is from OBSD 6.3, since that is the version 
> currently installed on the laptop; in case you're interested in the 
> 7.0/current's dmesg, just let me know.
> 
> All the best
> 
> -- 
> Alessandro De Laurenzis
> [mailto:[email protected]]
> Web: http://www.atlantide.mooo.com
> LinkedIn: http://it.linkedin.com/in/delaurenzis<dmesg.txt>

Reply via email to