On Wed, Feb 05, 2020 at 05:52:00PM +0000, Peter M??ller wrote:
> after experimenting with different MTU sizes and pf normalisation rules,
> I am getting the feeling of a root cause lying somewhere near path MTU
> discovery - perhaps in combination with IPsec.

A coworker ran into the same trace with routing domains and increased
interface MTU for jumbo frames.

> kernel: double fault trap, code=0
> Stopped at      rtable_l2+0xf:  pushq   %rdi
> ddb{0}> trace
> rtable_l2(0) at rtable_l2+0xf
> pf_setup_pdesc(ffff8000210e40a8,2,2,ffff80000016c400,fffffd806ee32e00,fffff80000210e41be)
>  at pf_setup_pdesc+0x7d
> pf_test(2,2,ffff80000013f000,ffff8000210e4290) at pf_test+0xfe
> ip_output(fffffd806ee32e00,0,fffffd807d95a5f8,800,0,fffffd807d95a588) at 
> ip_output+0x7cf
> tcp_output(ffff800000551980) at tcp_output+0x15c1
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> tcp_output(ffff800000551980) at tcp_output+0x1914
> [... some identical lines omitted...]
> tcp_timer_rexmt(ffff800000551980) at tcp_timer_rexmt+0x3f5
> softclock_thread(ffff8000210d2c58) at softclock_thread+0xfb
> end trace frame: 0x0, count: -50

I have created an automated test from his setup.

pair1 interface wit MTU 8000 <-> pair2 interface <-> loopback3 interface

All interfaces are in different routing domains.  Between loopback3
and pair2, pf switches the routing table.  If TCP sends from loopback3
to pair1, path MTU discovery fails and the kernel crashes at the
next rexmt timeout.

The problem is that ip_output() tries to add the PMTU route in
routing table 2, but tcp_output() expects it in routing table 3.
Then it goes back and forth until the stack is exhausted.

This diff creates the PMTU route in the original routing table.
Then PMTU discovery works.

Do you also use routing domains?

Sorry that the fix took a year.  I needed a precise and simple
description how to reproduce.  Do you still have this setup?  Does
my diff also fix your problem?

bluhm

Index: netinet/ip_output.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_output.c,v
retrieving revision 1.363
diff -u -p -r1.363 ip_output.c
--- netinet/ip_output.c 2 Feb 2021 17:47:42 -0000       1.363
+++ netinet/ip_output.c 5 Feb 2021 00:46:02 -0000
@@ -107,7 +107,10 @@ ip_output(struct mbuf *m0, struct mbuf *
        struct sockaddr_in *dst;
        struct tdb *tdb = NULL;
        u_long mtu;
-#if defined(MROUTING)
+#if NPF > 0
+       u_int orig_rtableid;
+#endif
+#ifdef MROUTING
        int rv;
 #endif
 
@@ -150,6 +153,7 @@ ip_output(struct mbuf *m0, struct mbuf *
        }
 
 #if NPF > 0
+       orig_rtableid = m->m_pkthdr.ph_rtableid;
 reroute:
 #endif
 
@@ -480,6 +484,15 @@ sendit:
                        ipsec_adjust_mtu(m, ifp->if_mtu);
 #endif
                error = EMSGSIZE;
+#if NPF > 0
+               /* pf changed routing table, use orig rtable for path MTU */
+               if (ro->ro_tableid != orig_rtableid) {
+                       rtfree(ro->ro_rt);
+                       ro->ro_tableid = orig_rtableid;
+                       ro->ro_rt = icmp_mtudisc_clone(
+                           satosin(&ro->ro_dst)->sin_addr, ro->ro_tableid, 0);
+               }
+#endif
                /*
                 * This case can happen if the user changed the MTU
                 * of an interface after enabling IP on it.  Because

Reply via email to