Hello,
On Tue, Jun 22, 2021 at 10:35:38PM +0200, Alexander Bluhm wrote:
> > > Could you try this diff?
> >
> > I have a kernel running with your diff over the last hours and created
> > quite some network traffic and the error didn't appear so far.
> > Previously, I was able to create it quite fast. So definitely an
> > improvement.
>
> That's strange. The diff would help only if routing domains are
> involved. But you said that you don't have them.
>
> tobhe@ and I could reproduce the problem on our setup. It looks
> like using enc0 as iked interface may trigger it. But the panic
> is not clearly reproducable, we don't see the pattern that actually
> causes the stack overflow.
>
> Sometimes it happens when a dynamic PMTU route expires, or when
> IPsec traffic begins, or when iked is restarted, or it does not
> panic at all. And the PMTU is not always calculated correctly.
>
> So please continue testing and report configurations that work and
> don't work.
>
> Our test setup uses:
> - laptop with iked as road warrior and IPv4 on enc0
> - VPN gateway as IPsec endpoint
> - host route from gateway to server with -mtu 1400
> - server running tcpbench -s
>
> When running tcpbench from laptop it crashes. Sometimes ...
>
let me share my hypothesis, which might be entirely wrong,
because I'm not familiar enough with code. anyway...
If I understand things right things start to go downhill
as soon as we detect, there is not enough room to insert
ipsec headers here in ip_output_ipsec_send():
591 /* Check if we are allowed to fragment */
592 ip = mtod(m, struct ip *);
593 if (ip_mtudisc && (ip->ip_off & htons(IP_DF)) && tdb->tdb_mtu &&
594 ntohs(ip->ip_len) > tdb->tdb_mtu &&
595 tdb->tdb_mtutimeout > gettime()) {
596 struct rtentry *rt = NULL;
597 int rt_mtucloned = 0;
598 int transportmode = 0;
note the code uses tdb->tdb_mtu to see if there is a space. If there
is not enough room to insert ipsec headers, we create a new
dynamic hostbased route and adjust MTU at such route. Finally
we drop packet:
616 if (rt != NULL) {
617 rt->rt_mtu = tdb->tdb_mtu;
618 if (ro != NULL && ro->ro_rt != NULL) {
619 rtfree(ro->ro_rt);
620 ro->ro_rt = rtalloc(&ro->ro_dst,
RT_RESOLVE,
621 m->m_pkthdr.ph_rtableid);
622 }
623 if (rt_mtucloned)
624 rtfree(rt);
625 }
626 ipsec_adjust_mtu(m, tdb->tdb_mtu);
627 m_freem(m);
628 return EMSGSIZE;
the return at 628, bails out back to ip_output():
405 #ifdef IPSEC
406 /*
407 * Check if the packet needs encapsulation.
408 */
409 if (tdb != NULL) {
410 /* Callee frees mbuf */
411 error = ip_output_ipsec_send(tdb, m, ro,
412 (flags & IP_FORWARDING) ? 1 : 0);
413 goto done;
414 }
...
519 done:
520 if (ro == &iproute && ro->ro_rt)
521 rtfree(ro->ro_rt);
522 if_put(ifp);
523 return (error);
the line 523 bails out to tcp_output() here:
1044 error = ip_output(m, tp->t_inpcb->inp_options,
1045 &tp->t_inpcb->inp_route,
1046 (ip_mtudisc ? IP_MTUDISC : 0), NULL, tp->t_inpcb,
0);
1047 break;
....
1081 if (error == EMSGSIZE) {
1082 /*
1083 * ip_output() will have already fixed the route
1084 * for us. tcp_mtudisc() will, as its last action,
1085 * initiate retransmission, so it is important to
1086 * not do so here.
1087 */
1088 tcp_mtudisc(tp->t_inpcb, -1);
1089 return (0);
1090 }
The error EMSGSIZE tells us to call tcp_mtudisc()
830 /*
831 * On receipt of path MTU corrections, flush old route and replace it
832 * with the new one. Retransmit all unacknowledged packets, to ensure
833 * that all packets will be received.
834 */
835 void
836 tcp_mtudisc(struct inpcb *inp, int errno)
837 {
838 struct tcpcb *tp = intotcpcb(inp);
839 struct rtentry *rt;
840 int change = 0;
841
842 if (tp == NULL)
843 return;
844
845 rt = in_pcbrtentry(inp);
846 if (rt != NULL) {
847 int orig_maxseg = tp->t_maxseg;
848
849 /*
850 * If this was not a host route, remove and realloc.
851 */
852 if ((rt->rt_flags & RTF_HOST) == 0) {
853 in_rtchange(inp, errno);
854 if ((rt = in_pcbrtentry(inp)) == NULL)
855 return;
856 }
857 if (orig_maxseg != tp->t_maxseg ||
858 (rt->rt_locks & RTV_MTU))
859 change = 1;
860 }
861 tcp_mss(tp, -1);
862
863 /*
864 * Resend unacknowledged packets
865 */
866 tp->snd_nxt = tp->snd_una;
867 if (change || errno > 0)
868 tcp_output(tp);
869 }
line 868 is the point, where we start recursion. If we reach the line
then code in tcp_mtudisc() had to execute line 859. tcp_mtudisc()
is where I'm getting lost.
condition at line 857:
orig_maxseg != tp->t_maxseg
looks suspicious to me as I don't see a mechanism how it could get
changed between line 847 and 857. So it must rt->rt_locks & RTV_MTU
test, which makes us to do 'change = 1'
also looking at tcp_mss() itself, it seems to me it still uses route linked
to pcb entry. However the route is the same, I see no place, which updates
the route in `inpcb` by host route created by ip_output_ipsec_send().
so we basically attempt to retransmit packets using same mss, which is
futile.
thanks and
regards
sashan