ok. msk is the first port added to the trunk? ie, it's the preferred port? if you run tcpdump on msk or watch systat if, do you see packets on msk?
dlg > On 28 Dec 2021, at 20:41, Alessandro De Laurenzis <[email protected]> > wrote: > > Hello David, > > On 28/12/2021 08:39, [email protected] wrote: >> Hi Alessandro, >> Did you bisect the whole kernel or just msk(4) changes? > > I made repeated installations from scratch, starting from 5.3 till 7.0, so I > bisected the whole kernel. > > But then, using the 7.0-stable sources, I reverted if_mskvar.h to 1.13 and > if_msk.c to 1.130, applying to the latter all the modifications subsequent to > 1.131 (in order to make the code usable, since there have been a few API > changes meanwhile), and verified that trunk(4) i/f is still fully functional, > so I think the issue is in the current if_msk.c code. > > Hope this helps. > > All the best > >>> On 8 Dec 2021, at 04:39, Alessandro De Laurenzis >>> <[email protected]> wrote: >>> >>> Greetings, >>> >>> I recently installed OpenBSD 7.0 on an old CoreDuo2 machine (Compaq 610, >>> complete dmesg in attach), which was powered by 5.5/5.6/5.7 some years ago, >>> without any relevant issues (after that, it has been used as home server >>> with Debian). >>> >>>> mskc0 at pci4 dev 0 function 0 "Marvell Yukon 88E8042" rev 0x10, Yukon-2 >>>> FE+ rev. A0 (0x0): msi >>>> msk0 at mskc0 port A: address 18:a9:05:94:ab:19 >>>> eephy0 at msk0 phy 0: 88E3016 10/100 PHY, rev. 0 >>> >>> I noticed that the trunk(4) failover protocol is broken when the Ethernet >>> cable is plugged in (starting in this configuration, no lease is acquired >>> from DHCP server, switching to Ethernet from wifi breaks the connection; in >>> both cases, trunk and msk0 status is: no carrier). >>> >>> It's worth noting that when msk0 is configured as "stand-alone" (i.e., >>> without trunk(4) failover), the connection is pretty functional and stable. >>> >>> Since I didn't remember any similar problems showing up with 5.x, I made a >>> bit of bisecting, and my conclusion is that the functionality got broken >>> b/w 6.2 and 6.3 and, specifically, after the following commit: >>> >>>> RCS file: /cvs/src/sys/dev/pci/if_msk.c,v >>>> ---------------------------- >>>> revision 1.131 >>>> date: 2018/01/06 03:11:04; author: dlg; state: Exp; lines: +251 -311; >>>> commitid: BhB8LisF92o4xfOK; >>>> rework the transmit and receive paths to address reliability issues. >>>> phessler@ has been having trouble with msk on overdrive 1000s. some >>>> of the issues relate to the driver not coping with exhaustion of >>>> mbufs for the rx ring, the other issues are corruption of the mcl9k >>>> pool that msk uses. >>>> this diff adds a timeout that the rx refill code uses when the rx >>>> ring is empty and cannot be filled. it'll periodically retry the >>>> ring refill until it can get some mbufs in the air again. >>>> the current code made hunting for the mcl9k issue too hard, so this >>>> rewrites it to be simpler and more like other drivers. there's now >>>> just arrays of mbuf pointers and dmamaps to shadow the hardware >>>> ring entries, and producer and consumer indexes. what was there >>>> before had linkes lists of something to hold mbuf pointers and >>>> dmamaps, and some way to go from the ring to go back to that. i >>>> think, it was hard to tell what was happening. >>>> this also copies the ADDR64 handling on the tx ring to the rx ring. >>>> this potentially makes more rx descriptors available, but that can >>>> happen later. >>>> in hindsight the mcl9k problem could have been from letting if_rxr >>>> allocate the entier ring. if every descriptor was filled, the chip >>>> may have run around the ring when it shouldnt have. giving rxr one >>>> less descriptor than there is on the ring may have fixed the problem >>>> too. >>>> this work also makes it easier to make msk mpsafe. >>>> tested by an ok phessler@ >>>> ok kettenis@ deraadt@ >>>> ============================================================================= >>> >>> and the corresponding one for sys/dev/pci/if_mskvar.h (revision 1.14, same >>> log). >>> >>> On a fresh 6.3 install, which was showing the issue, I reverted the 2 files >>> to the revisions 1.130 and 1.13 respectively, observing a functional >>> trunk(4) failover again. >>> >>> The diff is too long and complex, so I cannot say where the problem lies >>> exactly, but I hope this report contains enough information to start an >>> analysis (I'm copying the involved developers, just in case they are not >>> reading this list); of course, I'm available to test any patches (on 7.0 or >>> -current) and add further details if needed. >>> >>> Please note that the dmesg is from OBSD 6.3, since that is the version >>> currently installed on the laptop; in case you're interested in the >>> 7.0/current's dmesg, just let me know. >>> >>> All the best >>> >>> -- >>> Alessandro De Laurenzis >>> [mailto:[email protected]] >>> Web: http://www.atlantide.mooo.com >>> LinkedIn: http://it.linkedin.com/in/delaurenzis<dmesg.txt> > > -- > Alessandro De Laurenzis > [mailto:[email protected]] > Web: http://www.atlantide.mooo.com > LinkedIn: http://it.linkedin.com/in/delaurenzis
