Hi!

This commit is supposed to fix a problem we do not have a reproducer for.

The problem was that sometimes a TCP connection enters tcp_discardcb()
with a scheduled timer.  A temporary patch 57e27ff07aff was committed
to mask the problem.  This patch is supposed to fix the root cause.
However, again, we were not able to reproduce the problem reliably
and thus were not able to test that the just committed patch indeed
fixes the root cause. But the assertion is put back to tcp_discardcb().
In case you got a panic in tcp_discardcb(), please email me ASAP.

On Mon, Mar 18, 2024 at 08:57:10PM +0000, Gleb Smirnoff wrote:
T> The branch main has been updated by glebius:
T> 
T> URL: 
https://cgit.FreeBSD.org/src/commit/?id=e34ea0196f4497d3f3939025aff3a89ee77b652e
T> 
T> commit e34ea0196f4497d3f3939025aff3a89ee77b652e
T> Author:     Gleb Smirnoff <gleb...@freebsd.org>
T> AuthorDate: 2024-03-18 20:57:00 +0000
T> Commit:     Gleb Smirnoff <gleb...@freebsd.org>
T> CommitDate: 2024-03-18 20:57:00 +0000
T> 
T>     tcp: clear all TCP timers in tcp_timer_stop() when in callout
T>     
T>     When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling
T>     tcp_close(), we must also clear all other possible timers.  Otherwise,
T>     upon return, the callout would be scheduled again in tcp_timer_enter().
T>     
T>     Revert 57e27ff07aff, which was a temporary partial revert of otherwise
T>     correct 62d47d73b7eb, that exposed the problem being fixed now.  Add an
T>     extra assertion in tcp_timer_enter() to check we aren't arming callout 
for
T>     a closed connection.
T>     
T>     Reviewed by:    rscheff
T> ---
T>  sys/netinet/tcp_subr.c  | 3 +--
T>  sys/netinet/tcp_timer.c | 6 ++++--
T>  2 files changed, 5 insertions(+), 4 deletions(-)
T> 
T> diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c
T> index f618bc1ba04b..a6f84c297688 100644
T> --- a/sys/netinet/tcp_subr.c
T> +++ b/sys/netinet/tcp_subr.c
T> @@ -2395,10 +2395,9 @@ tcp_discardcb(struct tcpcb *tp)
T>  #endif
T>  
T>      INP_WLOCK_ASSERT(inp);
T> +    MPASS(!callout_active(&tp->t_callout));
T>      MPASS(TAILQ_EMPTY(&tp->snd_holes));
T>  
T> -    tcp_timer_stop(tp);
T> -
T>      /* free the reassembly queue, if any */
T>      tcp_reass_flush(tp);
T>  
T> diff --git a/sys/netinet/tcp_timer.c b/sys/netinet/tcp_timer.c
T> index ed50659abf8e..785f68be5621 100644
T> --- a/sys/netinet/tcp_timer.c
T> +++ b/sys/netinet/tcp_timer.c
T> @@ -881,6 +881,7 @@ tcp_timer_enter(void *xtp)
T>      if (tp_valid) {
T>              tcp_bblog_timer(tp, which, TT_PROCESSED, 0);
T>              if ((which = tcp_timer_next(tp, &precision)) != TT_N) {
T> +                    MPASS(tp->t_state > TCPS_CLOSED);
T>                      callout_reset_sbt_on(&tp->t_callout,
T>                          tp->t_timers[which], precision, tcp_timer_enter,
T>                          tp, inp_to_cpuid(inp), C_ABSOLUTE);
T> @@ -939,8 +940,7 @@ tcp_timer_active(struct tcpcb *tp, tt_which which)
T>  /*
T>   * Stop all timers associated with tcpcb.
T>   *
T> - * Called only on tcpcb destruction.  The tcpcb shall already be dropped 
from
T> - * the pcb lookup database and socket is not losing the last reference.
T> + * Called when tcpcb moves to TCPS_CLOSED.
T>   *
T>   * XXXGL: unfortunately our callout(9) is not able to fully stop a locked
T>   * callout even when only two threads are involved: the callout itself and 
the
T> @@ -963,6 +963,8 @@ tcp_timer_stop(struct tcpcb *tp)
T>  
T>              stopped = callout_stop(&tp->t_callout);
T>              MPASS(stopped == 0);
T> +            for (tt_which i = 0; i < TT_N; i++)
T> +                    tp->t_timers[i] = SBT_MAX;
T>      } else while(__predict_false(callout_stop(&tp->t_callout) == 0)) {
T>              INP_WUNLOCK(inp);
T>              kern_yield(PRI_UNCHANGED);

-- 
Gleb Smirnoff

Reply via email to