Ciao David,

msk is the first port added to the trunk? ie, it's the preferred port? if you 
run tcpdump on msk or watch systat if, do you see packets on msk?

The network config is pretty standard; an Ethernet port (msk0), a wifi one (iwn0), trunk0 with failover (using msk0 as "preferred" port):

$ cat /etc/hostname.trunk0 trunkproto failover
trunkport msk0
trunkport iwn0
autoconf
up

Bear with me, tcpdump is a kind of stranger world for me...

Enclosed please find the output files from the following commands:

$ doas tcpdump -i trunk0 -c 50 -w trunk0.dump
$ doas tcpdump -i msk0 -c 50 -w msk0.dump
$ doas tcpdump -i iwn0 -c 50 -w iwn0.dump

I see some "broadcast" packages on both trunk0 and msk0 (trunk0 didn't receive the inet address from the DHCP server, of course); nothing as expected on iwn0.

Hope this answers to your question...

Cheers

On 28 Dec 2021, at 20:41, Alessandro De Laurenzis <[email protected]> 
wrote:

Hello David,

On 28/12/2021 08:39, [email protected] wrote:
Hi Alessandro,
Did you bisect the whole kernel or just msk(4) changes?

I made repeated installations from scratch, starting from 5.3 till 7.0, so I 
bisected the whole kernel.

But then, using the 7.0-stable sources, I reverted if_mskvar.h to 1.13 and 
if_msk.c to 1.130, applying to the latter all the modifications subsequent to 
1.131 (in order to make the code usable, since there have been a few API 
changes meanwhile), and verified that trunk(4) i/f is still fully functional, 
so I think the issue is in the current if_msk.c code.

Hope this helps.

All the best

On 8 Dec 2021, at 04:39, Alessandro De Laurenzis <[email protected]> 
wrote:

Greetings,

I recently installed OpenBSD 7.0 on an old CoreDuo2 machine (Compaq 610, 
complete dmesg in attach), which was powered by 5.5/5.6/5.7 some years ago, 
without any relevant issues (after that, it has been used as home server with 
Debian).

mskc0 at pci4 dev 0 function 0 "Marvell Yukon 88E8042" rev 0x10, Yukon-2 FE+ 
rev. A0 (0x0): msi
msk0 at mskc0 port A: address 18:a9:05:94:ab:19
eephy0 at msk0 phy 0: 88E3016 10/100 PHY, rev. 0

I noticed that the trunk(4) failover protocol is broken when the Ethernet cable 
is plugged in (starting in this configuration, no lease is acquired from DHCP 
server, switching to Ethernet from wifi breaks the connection; in both cases, 
trunk and msk0 status is: no carrier).

It's worth noting that when msk0 is configured as "stand-alone" (i.e., without 
trunk(4) failover), the connection is pretty functional and stable.

Since I didn't remember any similar problems showing up with 5.x, I made a bit 
of bisecting, and my conclusion is that the functionality got broken b/w 6.2 
and 6.3 and, specifically, after the following commit:

RCS file: /cvs/src/sys/dev/pci/if_msk.c,v
----------------------------
revision 1.131
date: 2018/01/06 03:11:04;  author: dlg;  state: Exp;  lines: +251 -311;  
commitid: BhB8LisF92o4xfOK;
rework the transmit and receive paths to address reliability issues.
phessler@ has been having trouble with msk on overdrive 1000s. some
of the issues relate to the driver not coping with exhaustion of
mbufs for the rx ring, the other issues are corruption of the mcl9k
pool that msk uses.
this diff adds a timeout that the rx refill code uses when the rx
ring is empty and cannot be filled. it'll periodically retry the
ring refill until it can get some mbufs in the air again.
the current code made hunting for the mcl9k issue too hard, so this
rewrites it to be simpler and more like other drivers. there's now
just arrays of mbuf pointers and dmamaps to shadow the hardware
ring entries, and producer and consumer indexes. what was there
before had linkes lists of something to hold mbuf pointers and
dmamaps, and some way to go from the ring to go back to that. i
think, it was hard to tell what was happening.
this also copies the ADDR64 handling on the tx ring to the rx ring.
this potentially makes more rx descriptors available, but that can
happen later.
in hindsight the mcl9k problem could have been from letting if_rxr
allocate the entier ring. if every descriptor was filled, the chip
may have run around the ring when it shouldnt have. giving rxr one
less descriptor than there is on the ring may have fixed the problem
too.
this work also makes it easier to make msk mpsafe.
tested by an ok phessler@
ok kettenis@ deraadt@
=============================================================================

and the corresponding one for sys/dev/pci/if_mskvar.h (revision 1.14, same log).

On a fresh 6.3 install, which was showing the issue, I reverted the 2 files to 
the revisions 1.130 and 1.13 respectively, observing a functional trunk(4) 
failover again.

The diff is too long and complex, so I cannot say where the problem lies 
exactly, but I hope this report contains enough information to start an 
analysis (I'm copying the involved developers, just in case they are not 
reading this list); of course, I'm available to test any patches (on 7.0 or 
-current) and add further details if needed.

Please note that the dmesg is from OBSD 6.3, since that is the version 
currently installed on the laptop; in case you're interested in the 
7.0/current's dmesg, just let me know.

All the best

--
Alessandro De Laurenzis
[mailto:[email protected]]
Web: http://www.atlantide.mooo.com
LinkedIn: http://it.linkedin.com/in/delaurenzis<dmesg.txt>

--
Alessandro De Laurenzis
[mailto:[email protected]]
Web: http://www.atlantide.mooo.com
LinkedIn: http://it.linkedin.com/in/delaurenzis


--
Alessandro De Laurenzis
[mailto:[email protected]]
Web: http://www.atlantide.mooo.com
LinkedIn: http://it.linkedin.com/in/delaurenzis

Attachment: trunk0.dump
Description: Binary data

Attachment: msk0.dump
Description: Binary data

Attachment: iwn0.dump
Description: Binary data

Reply via email to