> On 11 Dec 2023, at 17:49, Roman Arutyunyan <a...@nginx.com> wrote: > > Hi, > > On Mon, Dec 11, 2023 at 01:40:06PM +0400, Sergey Kandaurov wrote: >> >>> On 30 Nov 2023, at 15:05, Roman Arutyunyan <a...@nginx.com> wrote: >>> >>> Hi, >>> >>> A number of patches discussed previously. >>> >> >> A couple more patches to follow. >> >> Last two weeks I spent pondering if we have to migrate to per-path >> congestion control and RTT estimator (as inspired by Alibaba patch >> https://mailman.nginx.org/pipermail/nginx-devel/2023-January/PNDV4MEDZPN5QJ6RZ2FQCWA7NGQ2ILSX.html), >> in order to alleviate severe congestion controller bugs in in-flight >> accounting that may result in connection stalls due to a type underflow. >> Previously existed, they become more visible within this series (that >> makes PATH_CHALLENGE frames congestion aware), such that any migration >> may end up in connection stalls, easily seen in tests. >> >> After discussion with Roman, I came to conclusion that it's possible >> to fix those bugs without touching a single context. >> >> Other than that below are various optimization I happened to see >> while testing in-flight accounting during connection migration. >> >> # HG changeset patch >> # User Sergey Kandaurov <pluk...@nginx.com> >> # Date 1702282940 -14400 >> # Mon Dec 11 12:22:20 2023 +0400 >> # Node ID 34311bcfd27d3cc420771b9349108e839d6a532e >> # Parent 3518c40102b2a0f4a5be00fef00becdff70921cb >> QUIC: reset RTT estimator for the new path. >> >> RTT is a property of the path, it must be reset on confirming a peer's >> ownership of its new address. >> >> diff --git a/src/event/quic/ngx_event_quic.c >> b/src/event/quic/ngx_event_quic.c >> --- a/src/event/quic/ngx_event_quic.c >> +++ b/src/event/quic/ngx_event_quic.c >> @@ -260,14 +260,7 @@ ngx_quic_new_connection(ngx_connection_t >> >> ngx_queue_init(&qc->free_frames); >> >> - qc->avg_rtt = NGX_QUIC_INITIAL_RTT; >> - qc->rttvar = NGX_QUIC_INITIAL_RTT / 2; >> - qc->min_rtt = NGX_TIMER_INFINITE; >> - qc->first_rtt = NGX_TIMER_INFINITE; >> - >> - /* >> - * qc->latest_rtt = 0 >> - */ >> + ngx_quic_init_rtt(qc); >> >> qc->pto.log = c->log; >> qc->pto.data = c; >> diff --git a/src/event/quic/ngx_event_quic_connection.h >> b/src/event/quic/ngx_event_quic_connection.h >> --- a/src/event/quic/ngx_event_quic_connection.h >> +++ b/src/event/quic/ngx_event_quic_connection.h >> @@ -65,6 +65,13 @@ typedef struct ngx_quic_keys_s ng >> >> #define ngx_quic_get_socket(c) ((ngx_quic_socket_t >> *)((c)->udp)) >> >> +#define ngx_quic_init_rtt(qc) >> \ >> + (qc)->avg_rtt = NGX_QUIC_INITIAL_RTT; >> \ >> + (qc)->rttvar = NGX_QUIC_INITIAL_RTT / 2; >> \ >> + (qc)->min_rtt = NGX_TIMER_INFINITE; >> \ >> + (qc)->first_rtt = NGX_TIMER_INFINITE; >> \ >> + (qc)->latest_rtt = 0; >> + >> >> typedef enum { >> NGX_QUIC_PATH_IDLE = 0, >> diff --git a/src/event/quic/ngx_event_quic_migration.c >> b/src/event/quic/ngx_event_quic_migration.c >> --- a/src/event/quic/ngx_event_quic_migration.c >> +++ b/src/event/quic/ngx_event_quic_migration.c >> @@ -181,6 +181,8 @@ valid: >> 14720)); >> qc->congestion.ssthresh = (size_t) -1; >> qc->congestion.recovery_start = ngx_current_msec; >> + >> + ngx_quic_init_rtt(qc); >> } >> >> path->validated = 1; > > Looks fine. > >> # HG changeset patch >> # User Sergey Kandaurov <pluk...@nginx.com> >> # Date 1702285561 -14400 >> # Mon Dec 11 13:06:01 2023 +0400 >> # Node ID 2e747e7b203e9b62455fc6b8457bd10879a88bec >> # Parent 34311bcfd27d3cc420771b9349108e839d6a532e >> QUIC: path aware in-flight bytes accounting. >> >> On-packet acknowledgement is made path aware, as per RFC 9000, Section 9.4: >> Packets sent on the old path MUST NOT contribute to congestion control >> or RTT estimation for the new path. >> >> To make it possible over a single congestion control context, the first >> packet >> to be sent after the new path has been validated, which includes resetting >> the >> congestion controller and RTT estimator, is now remembered in the connection. >> Packets sent previously, such as on the old path, are not taken into account. >> >> diff --git a/src/event/quic/ngx_event_quic_ack.c >> b/src/event/quic/ngx_event_quic_ack.c >> --- a/src/event/quic/ngx_event_quic_ack.c >> +++ b/src/event/quic/ngx_event_quic_ack.c >> @@ -325,6 +325,10 @@ ngx_quic_congestion_ack(ngx_connection_t >> qc = ngx_quic_get_connection(c); >> cg = &qc->congestion; >> >> + if (f->pnum < qc->rst_pnum) { >> + return; >> + } > > While this condition looks simple, there's something more complex going on. > The value for pnum corresponds to the first application level packet that is > send over the new path. Here however there's no check for packet level > because > other levels can only show up while rst_pnum is zero. I think this should be > explained in the commit log.
Thanks, added this para at the end for clarity: Note that although the packet number is saved per-connection, the added checks affect application level packets only. For non-application level packets, which are only processed prior to the handshake is complete, the remembered packet number remains set to zero. > >> + >> blocked = (cg->in_flight >= cg->window) ? 1 : 0; >> >> cg->in_flight -= f->plen; >> @@ -667,6 +671,10 @@ ngx_quic_congestion_lost(ngx_connection_ >> qc = ngx_quic_get_connection(c); >> cg = &qc->congestion; >> >> + if (f->pnum < qc->rst_pnum) { >> + return; >> + } > > The above applies here as well. > >> + >> blocked = (cg->in_flight >= cg->window) ? 1 : 0; >> >> cg->in_flight -= f->plen; >> diff --git a/src/event/quic/ngx_event_quic_connection.h >> b/src/event/quic/ngx_event_quic_connection.h >> --- a/src/event/quic/ngx_event_quic_connection.h >> +++ b/src/event/quic/ngx_event_quic_connection.h >> @@ -266,6 +266,8 @@ struct ngx_quic_connection_s { >> ngx_quic_streams_t streams; >> ngx_quic_congestion_t congestion; >> >> + uint64_t rst_pnum; /* first on validated >> path */ >> + >> off_t received; >> >> ngx_uint_t error; >> diff --git a/src/event/quic/ngx_event_quic_migration.c >> b/src/event/quic/ngx_event_quic_migration.c >> --- a/src/event/quic/ngx_event_quic_migration.c >> +++ b/src/event/quic/ngx_event_quic_migration.c >> @@ -110,6 +110,7 @@ ngx_quic_handle_path_response_frame(ngx_ >> ngx_uint_t rst; >> ngx_queue_t *q; >> ngx_quic_path_t *path, *prev; >> + ngx_quic_send_ctx_t *ctx; >> ngx_quic_connection_t *qc; >> >> qc = ngx_quic_get_connection(c); >> @@ -174,6 +175,11 @@ valid: >> } >> >> if (rst) { >> + /* prevent old path packets contribution to congestion control */ >> + >> + ctx = ngx_quic_get_send_ctx(qc, ssl_encryption_application); >> + qc->rst_pnum = ctx->pnum; >> + >> ngx_memzero(&qc->congestion, sizeof(ngx_quic_congestion_t)); >> >> qc->congestion.window = ngx_min(10 * qc->tp.max_udp_payload_size, > > Here we call a reset function for rtt below, but there's no such function for > congestion. As discussed in private, this will be addresses later while > improving congestion algorithm. > > The patch looks ok. > >> # HG changeset patch >> # User Sergey Kandaurov <pluk...@nginx.com> >> # Date 1702287236 -14400 >> # Mon Dec 11 13:33:56 2023 +0400 >> # Node ID 561791598a9610e79ea4bc24788ff887d032b3b3 >> # Parent 2e747e7b203e9b62455fc6b8457bd10879a88bec >> QUIC: abandoned PMTU discovery on the old path. >> >> It is seemingly a useless work to probe MTU on the old path, which may be >> gone. >> This also goes in line with RFC 9000, Section 8.2.4, Failed Path Validation, >> which prescribes to abandon old path validation if switching to a new path. >> >> diff --git a/src/event/quic/ngx_event_quic_migration.c >> b/src/event/quic/ngx_event_quic_migration.c >> --- a/src/event/quic/ngx_event_quic_migration.c >> +++ b/src/event/quic/ngx_event_quic_migration.c >> @@ -497,6 +497,7 @@ ngx_quic_handle_migration(ngx_connection >> } >> } >> >> + qc->path->state = NGX_QUIC_PATH_IDLE; >> qc->path->tag = NGX_QUIC_PATH_BACKUP; >> ngx_quic_path_dbg(c, "is now backup", qc->path); > > When new path validation fails, we return to the backup path with potentially > unfinished MTU discovery. We need to restart MTU discovery in this case by > calling ngx_quic_discover_path_mtu(). If the discovery was finished, the > function will do nothing. A separate case is when the path was validated, but > initial MTU (1200) was not verified due to amplification limit. This should > be > addressed as well. While restarting PMTUD sounds doable, restoring MTU path validation may need more work. I think we can postpone this patch, since it is not directly related to this series. > >> # HG changeset patch >> # User Sergey Kandaurov <pluk...@nginx.com> >> # Date 1702287237 -14400 >> # Mon Dec 11 13:33:57 2023 +0400 >> # Node ID 2f65a579a235d08902e47cf796db3c1bc1ca8790 >> # Parent 561791598a9610e79ea4bc24788ff887d032b3b3 >> QUIC: do not arm PTO timer for path validation. >> >> As per RFC 9000, Section 9.4, a separate timer is set when a PATH_CHALLENGE >> is sent. Further, PATH_CHALLENGE frames have a distinct retransmission >> policy. >> Thus, it makes no sense to arm the PTO timer for PATH_CHALLENGE, which may >> result in spurious PTO events. >> >> diff --git a/src/event/quic/ngx_event_quic_ack.c >> b/src/event/quic/ngx_event_quic_ack.c >> --- a/src/event/quic/ngx_event_quic_ack.c >> +++ b/src/event/quic/ngx_event_quic_ack.c >> @@ -752,7 +752,23 @@ ngx_quic_set_lost_timer(ngx_connection_t >> } >> } >> >> - q = ngx_queue_last(&ctx->sent); >> + /* PATH_CHALLENGE timer is separate (see RFC 9000, 9.4) */ >> + >> + for (q = ngx_queue_last(&ctx->sent); >> + q != ngx_queue_sentinel(&ctx->sent); >> + q = ngx_queue_prev(q)) >> + { >> + f = ngx_queue_data(q, ngx_quic_frame_t, queue); >> + >> + if (f->type != NGX_QUIC_FT_PATH_CHALLENGE) { >> + break; >> + } >> + } >> + >> + if (q == ngx_queue_sentinel(&ctx->sent)) { >> + continue; >> + } >> + >> f = ngx_queue_data(q, ngx_quic_frame_t, queue); >> w = (ngx_msec_int_t) >> (f->send_time + (ngx_quic_pto(c, ctx) << qc->pto_count) - >> now); >> # HG changeset patch >> # User Sergey Kandaurov <pluk...@nginx.com> >> # Date 1702287238 -14400 >> # Mon Dec 11 13:33:58 2023 +0400 >> # Node ID 4e1706b406978b8546f68a72e873c8579b8bc1d5 >> # Parent 2f65a579a235d08902e47cf796db3c1bc1ca8790 >> QUIC: do not consider PATH_CHALLENGE acknowledgment an RTT sample. >> >> Since PATH_CHALLENGE frames use a separate retransmission timer (other than >> PTO), it makes no sense to use PATH_CHALLENGE ACKs for RTT samples as well. >> This extends RFC 9002, Section 6.2.2 to ACK frames. Previously, through >> influence on RTT estimator, it could be possible to affect PTO duration, >> which is not used for resending PATH_CHALLENGE frames. >> >> diff --git a/src/event/quic/ngx_event_quic_ack.c >> b/src/event/quic/ngx_event_quic_ack.c >> --- a/src/event/quic/ngx_event_quic_ack.c >> +++ b/src/event/quic/ngx_event_quic_ack.c >> @@ -264,7 +264,9 @@ ngx_quic_handle_ack_frame_range(ngx_conn >> break; >> } >> >> - if (f->pnum == max) { >> + /* PATH_CHALLENGE timer is separate (see RFC 9000, 9.4) */ >> + >> + if (f->pnum == max && f->type != NGX_QUIC_FT_PATH_CHALLENGE) { >> st->max_pn = f->send_time; >> } > > The last two patches will be addressed separately. -- Sergey Kandaurov _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel