Re: TCP sends 9KB segments via netgraph tunnel despite MTU/MSS — TSO-related?

Konstantin Belousov Wed, 14 May 2025 16:18:55 -0700

On Wed, May 14, 2025 at 10:45:27PM +0300, Ivan wrote:
> Hello,
> 
> I've been investigating a network issue that took quite some time to trace. I 
> still cannot reproduce it in a test environment, but it consistently occurs 
> on a specific FreeBSD server with a more complex network configuration.
> 
> Summary of the issue:  
> Under certain conditions, the system attempts to send TCP packets larger than 
> 9 KB through a netgraph-based tunnel with MTU 1472, even though MSS was 
> negotiated to 1400.
> 
> This happens when the initial route is via the default uplink, but PF then 
> re-routes the packet via the netgraph tunnel using `route-to`. If the traffic 
> is routed through ng0 directly (without PF), the issue does not occur. The 
> problem also disappears if TSO is disabled on the uplink NIC.
> 
> System:
>   FreeBSD 13.5-RELEASE
>   releng/13.5-n259162-882b9f3f2218 GENERIC amd64
> 
> Interfaces:
> 
> - Primary LAN interface (where disabling TSO fixes the problem):
>     igb0, MTU 1500  
>     options=4e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,
>                     VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,
>                     RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
> 
> - Internet uplink:
>     onp, VLAN over igb0, MTU 1500  
>     options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
> 
> - Netgraph tunnel:
>     ng0, MTU 1472  
>     inet 10.10.0.1 → 10.10.0.2
> 
> PF rules used for re-routing:
>     nat log(all) on onp inet from 10.10.0.1 to any tag NG -> (ng0) round-robin
>     pass out quick on onp route-to (ng0 10.10.0.2) inet all flags S/SA keep 
> state tagged NG
> 
> Packet trace (via pflog during a POST request ~10KB to YouTube):
> 
>     15:46:01.784956 IP 10.10.0.1.62031 > 209.85.233.198.443: Flags [P.], seq 
> 597:9703, length 9106
>     15:46:01.785020 IP 127.0.0.1 > 10.10.0.1: ICMP 209.85.233.198 unreachable 
> - need to frag (mtu 1472)
> 
> This shows the kernel trying to send a 9106-byte segment over a link that 
> clearly can't handle it. The MSS was already negotiated at 1400, so this 
> seems unexpected. The ICMP response is generated locally. The result is 
> segment loss, out-of-order retransmissions, and poor TLS performance.
> 
> I also reproduced this behavior with OpenVPN — so the issue is not 
> netgraph-specific.
> 
> Questions:
> - Is this expected behavior due to TSO interacting poorly with PF route-to?
> - Should TSO respect the effective MTU based on the post-PF routing decision?
> - Or is this a bug in the TCP offload path?


TCP output code decides to enable TSO based on the outgoing interface caps.
The interface is looked up through the routing table.  There is no knowledge,
and probably should not be, of the packet filter mangling the packets after
it was passed to the ip_output() (and later).

BTW, I think that route-to would similarly break TLS and IPSEC inline offloads.

Re: TCP sends 9KB segments via netgraph tunnel despite MTU/MSS — TSO-related?

Reply via email to