Hello,

On Tue, Jun 22, 2021 at 10:35:38PM +0200, Alexander Bluhm wrote:
> > > Could you try this diff?
> >
> > I have a kernel running with your diff over the last hours and created
> > quite some network traffic and the error didn't appear so far.
> > Previously, I was able to create it quite fast.  So definitely an
> > improvement.
> 
> That's strange.  The diff would help only if routing domains are
> involved.  But you said that you don't have them.
> 
> tobhe@ and I could reproduce the problem on our setup.  It looks
> like using enc0 as iked interface may trigger it.  But the panic
> is not clearly reproducable, we don't see the pattern that actually
> causes the stack overflow.
> 
> Sometimes it happens when a dynamic PMTU route expires, or when
> IPsec traffic begins, or when iked is restarted, or it does not
> panic at all.  And the PMTU is not always calculated correctly.
> 
> So please continue testing and report configurations that work and
> don't work.
> 
> Our test setup uses:
> - laptop with iked as road warrior and IPv4 on enc0
> - VPN gateway as IPsec endpoint
> - host route from gateway to server with -mtu 1400
> - server running tcpbench -s
> 
> When running tcpbench from laptop it crashes.  Sometimes ...
> 

    let me share my hypothesis, which might be entirely wrong,
    because I'm not familiar enough with code. anyway...

    If I understand things right things start to go downhill
    as soon as we detect, there is not enough room to insert
    ipsec headers here in ip_output_ipsec_send():

 591         /* Check if we are allowed to fragment */
 592         ip = mtod(m, struct ip *);
 593         if (ip_mtudisc && (ip->ip_off & htons(IP_DF)) && tdb->tdb_mtu &&
 594             ntohs(ip->ip_len) > tdb->tdb_mtu &&
 595             tdb->tdb_mtutimeout > gettime()) {
 596                 struct rtentry *rt = NULL;
 597                 int rt_mtucloned = 0;
 598                 int transportmode = 0;

    note the code uses tdb->tdb_mtu to see if there is a space. If there
    is not enough room to insert ipsec headers, we create a new
    dynamic hostbased route and adjust MTU at such route. Finally
    we drop packet:

 616                 if (rt != NULL) {
 617                         rt->rt_mtu = tdb->tdb_mtu;
 618                         if (ro != NULL && ro->ro_rt != NULL) {
 619                                 rtfree(ro->ro_rt);
 620                                 ro->ro_rt = rtalloc(&ro->ro_dst, 
RT_RESOLVE,
 621                                     m->m_pkthdr.ph_rtableid);
 622                         }
 623                         if (rt_mtucloned)
 624                                 rtfree(rt);
 625                 }
 626                 ipsec_adjust_mtu(m, tdb->tdb_mtu);
 627                 m_freem(m);
 628                 return EMSGSIZE;

    the return at 628, bails out back to ip_output():

 405 #ifdef IPSEC
 406         /*
 407          * Check if the packet needs encapsulation.
 408          */
 409         if (tdb != NULL) {
 410                 /* Callee frees mbuf */
 411                 error = ip_output_ipsec_send(tdb, m, ro,
 412                     (flags & IP_FORWARDING) ? 1 : 0);
 413                 goto done;
 414         }
 ...
 519 done:
 520         if (ro == &iproute && ro->ro_rt)
 521                 rtfree(ro->ro_rt);
 522         if_put(ifp);
 523         return (error);

    the line 523 bails out to tcp_output() here:

1044                 error = ip_output(m, tp->t_inpcb->inp_options,
1045                         &tp->t_inpcb->inp_route,
1046                         (ip_mtudisc ? IP_MTUDISC : 0), NULL, tp->t_inpcb, 
0);
1047                 break;
....
1081                 if (error == EMSGSIZE) {
1082                         /*
1083                          * ip_output() will have already fixed the route
1084                          * for us.  tcp_mtudisc() will, as its last action,
1085                          * initiate retransmission, so it is important to
1086                          * not do so here.
1087                          */
1088                         tcp_mtudisc(tp->t_inpcb, -1);
1089                         return (0);
1090                 }

    The error EMSGSIZE tells us to call tcp_mtudisc()
 830 /*              
 831  * On receipt of path MTU corrections, flush old route and replace it
 832  * with the new one.  Retransmit all unacknowledged packets, to ensure
 833  * that all packets will be received.
 834  */     
 835 void            
 836 tcp_mtudisc(struct inpcb *inp, int errno)
 837 {
 838         struct tcpcb *tp = intotcpcb(inp);
 839         struct rtentry *rt;
 840         int change = 0;
 841 
 842         if (tp == NULL)
 843                 return;
 844 
 845         rt = in_pcbrtentry(inp);
 846         if (rt != NULL) {
 847                 int orig_maxseg = tp->t_maxseg;
 848 
 849                 /*
 850                  * If this was not a host route, remove and realloc.
 851                  */
 852                 if ((rt->rt_flags & RTF_HOST) == 0) {
 853                         in_rtchange(inp, errno);
 854                         if ((rt = in_pcbrtentry(inp)) == NULL)
 855                                 return;
 856                 }       
 857                 if (orig_maxseg != tp->t_maxseg ||
 858                     (rt->rt_locks & RTV_MTU))
 859                         change = 1;
 860         }
 861         tcp_mss(tp, -1);
 862 
 863         /*
 864          * Resend unacknowledged packets
 865          */
 866         tp->snd_nxt = tp->snd_una;
 867         if (change || errno > 0)
 868                 tcp_output(tp);
 869 }

    line 868 is the point, where we start recursion. If we reach the line 
    then code in tcp_mtudisc() had to execute line 859. tcp_mtudisc()
    is where I'm getting lost.

    condition at line 857:
        orig_maxseg != tp->t_maxseg

    looks suspicious to me as I don't see a mechanism how it could get
    changed between line 847 and 857. So it must rt->rt_locks & RTV_MTU
    test, which makes us to do 'change = 1'

    also looking at tcp_mss() itself, it seems to me it still uses route linked
    to pcb entry. However the route is the same, I see no place, which updates
    the route in `inpcb` by host route created by ip_output_ipsec_send().

    so we basically attempt to retransmit packets using same mss, which is
    futile.


thanks and
regards
sashan

Reply via email to