On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > On 27.1.2024. 21:01, Marcus Glocker wrote: > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > > > >> On 26.1.2024. 21:56, Marcus Glocker wrote: > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >>> > >>>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly > >>>> production. > >>>> > >>>> Setup is very simple > >>>> > >>>> em0 - carp <- uplink > >>>> em1 - pfsync > >>>> ix1 - vlans - carp > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of > >>> that box? You can mask the IPs if it's too sensitive. > >>> > >>> I still try to reproduce the issue here, and for now I can't. > >>> Maybe in your full ifconfig output I can see some specifics about your > >>> configuration, which makes it more likely to reproduce the issue here. > >>> > >> Hi, > >> > >> here's ifconfig from second setup where watchdog is triggered much faster. > >> Originally in this setup uplink is ix0, I've change that to em0 to see > >> would the problem be same as in other setup and it is, and that's good > >> because this is pfsync setup for students and I can do whatever I want > >> with it :) > > Thanks. > > > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > > you are seeing :-( I'm not sure if this is something related to your > > I350. > > > > Also, I can't understand why the watchdog still triggers when you disable > > TSO by setting net.inet.tcp.tso=0. > > > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > > diff to -current and test it? I doubt it will make a difference, but > > I'm running a bit out of ideas here. > > > Hi, > > with this diff I'm still getting em watchdog > > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 > TDT 120
Thanks for testing again. I think we might have a generic problem with TSO with the current em(4) code and some chips. Referring to this recent FreeBSD commit. e1000: disable TSO on lem(4) and em(4): Disable TSO on lem(4) and em(4) until a ring stall can be debugged. https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 Can you try this diff to specifically disable TSO for I350 please? We will need to discuss internally which way to go. I see those options currently: - Entirely pull out the TSO diff. - Leave the TSO code in but disable TSO for now (what FreeBSD did). - Leave the TSO code in but disable TSO only for chips we see issues with (this diff). Index: if_em.c =================================================================== RCS file: /cvs/src/sys/dev/pci/if_em.c,v diff -u -p -u -p -r1.370 if_em.c --- if_em.c 31 Dec 2023 08:42:33 -0000 1.370 +++ if_em.c 28 Jan 2024 09:30:59 -0000 @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc) if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; + /* XXX: Enabling TSO on I350 causes watchdogs */ + if (sc->hw.mac_type != em_i350) + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; } /*