Li, Ji <j...@akamai.com> wrote: > In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is > used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as > the default congestion control set by sysctl and immediately its parameters > stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net: > tcp: assign tcp cong_ops when tcp sk is created") splits assignment and > initialization into two steps: assignment is done before SYN or SYN-ACK > is sent out; initialization is done after 3WHS (assume without > fastopen). But this can cause out-of-order invocation for ca_ops functions > other than .init() during 3WHS, as they could be called before its > parameters get initialized. It may cause unexpected behavior for > congestion controls, and make troubles for those that need dynamic > object allocation, like tcp_cdg etc.
What exactly is the problem? Kernel crash? AFAICS cdg can cope with NULL ca->gradients. > We used tcp_dctcp as an example to visualize the problem, and set it as > default congestion control via sysctl. Three parameters > (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored > when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are > called during 3WHS. All of three are found to be zero, which is likely > impossible if dctcp_init() was called ahead, where those three > parameters should be initialized. Some other congestion controls are > examined too and the same problem was reproduced. Why is this a problem? > diff --git a/include/net/tcp.h b/include/net/tcp.h > +{ > + if (inet_csk(sk)->icsk_ca_initialized) > + return inet_csk(sk)->icsk_ca_ops->ssthresh(sk); > + else > + return tcp_reno_ssthresh(sk); > +} > + > /* Enter Loss state. If we detect SACK reneging, forget all SACK information > * and reset tags completely, otherwise preserve SACKs. If receiver > * dropped its ofo queue, we will know this due to reneging detection. > @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk) > !after(tp->high_seq, tp->snd_una) || > (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) { > tp->prior_ssthresh = tcp_current_ssthresh(sk); > - tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); > + tp->snd_ssthresh = tcp_ca_ssthresh(sk); > tcp_ca_event(sk, CA_EVENT_LOSS); > tcp_init_undo(tp); > } Can you explain how we can do loss recovery on a non-established connection ....? > @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, > u32 acked_sacked, > if (tcp_in_cwnd_reduction(sk)) { > /* Reduce cwnd if state mandates */ > tcp_cwnd_reduction(sk, acked_sacked, flag); > - } else if (tcp_may_raise_cwnd(sk, flag)) { > + } else if (tcp_may_raise_cwnd(sk, flag) && > + inet_csk(sk)->icsk_ca_initialized) { > /* Advance cwnd if state allows */ > tcp_cong_avoid(sk, ack, acked_sacked); Same here. How is this called for minisock/sk with non-inited cong ops? Once sk moves to TCP_ESTABLISHED congestion ops are supposed to be initialized. If thats not the case then thats a bug and should be fixed rather than not calling the cc state machinery any more.