Li, Ji <j...@akamai.com> wrote:
> In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
> used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
> the default congestion control set by sysctl and immediately its parameters
> stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
> tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
> initialization into two steps: assignment is done before SYN or SYN-ACK
> is sent out; initialization is done after 3WHS (assume without
> fastopen). But this can cause out-of-order invocation for ca_ops functions
> other than .init() during 3WHS, as they could be called before its
> parameters get initialized. It may cause unexpected behavior for
> congestion controls, and make troubles for those that need dynamic
> object allocation, like tcp_cdg etc.

What exactly is the problem?
Kernel crash?

AFAICS cdg can cope with NULL ca->gradients.

> We used tcp_dctcp as an example to visualize the problem, and set it as
> default congestion control via sysctl. Three parameters
> (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
> when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
> called during 3WHS. All of three are found to be zero, which is likely
> impossible if dctcp_init() was called ahead, where those three
> parameters should be initialized. Some other congestion controls are
> examined too and the same problem was reproduced.

Why is this a problem?

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> +{
> +       if (inet_csk(sk)->icsk_ca_initialized)
> +               return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> +       else
> +               return tcp_reno_ssthresh(sk);
> +}
> +
>  /* Enter Loss state. If we detect SACK reneging, forget all SACK information
>   * and reset tags completely, otherwise preserve SACKs. If receiver
>   * dropped its ofo queue, we will know this due to reneging detection.
> @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
>             !after(tp->high_seq, tp->snd_una) ||
>             (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
>                 tp->prior_ssthresh = tcp_current_ssthresh(sk);
> -               tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> +               tp->snd_ssthresh = tcp_ca_ssthresh(sk);
>                 tcp_ca_event(sk, CA_EVENT_LOSS);
>                 tcp_init_undo(tp);
>         }

Can you explain how we can do loss recovery on a non-established
connection ....?

> @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, 
> u32 acked_sacked,
>         if (tcp_in_cwnd_reduction(sk)) {
>                 /* Reduce cwnd if state mandates */
>                 tcp_cwnd_reduction(sk, acked_sacked, flag);
> -       } else if (tcp_may_raise_cwnd(sk, flag)) {
> +       } else if (tcp_may_raise_cwnd(sk, flag) &&
> +                  inet_csk(sk)->icsk_ca_initialized) {
>                 /* Advance cwnd if state allows */
>                 tcp_cong_avoid(sk, ack, acked_sacked);

Same here.  How is this called for minisock/sk with non-inited cong ops?
Once sk moves to TCP_ESTABLISHED congestion ops are supposed to
be initialized.

If thats not the case then thats a bug and should be fixed rather
than not calling the cc state machinery any more.

Reply via email to