On Tue, Sep 19, 2017 at 4:04 AM, Oleksandr Natalenko <oleksa...@natalenko.name> wrote: > Hi. > > 18.09.2017 23:40, Yuchung Cheng wrote: >> >> I assume this kernel does not have the patch that Neal proposed in his >> first reply? > > > Correct. > >> The main warning needs to be triggered by another peculiar SACK that >> kicks the sender into recovery again (after undo). Please let it run >> longer if possible to see if we can get both. But the new data does >> indicate the we can (validly) be in CA_Open with retrans_out > 0. > > > OK, here it is: > > === > » LC_TIME=C jctl -kb | grep RIP > … > Sep 19 12:54:03 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > Sep 19 12:54:22 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > Sep 19 12:54:25 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > Sep 19 12:56:00 defiant kernel: RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990 > Sep 19 12:57:07 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > Sep 19 12:57:14 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > Sep 19 12:58:04 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0 > … > === > > Note timestamps — two types of warning are distant in time, so didn't happen > at once. > > While still running this kernel, anything else I can check for you? Thanks. Based on all the experiments you did I believe there's other code path than my hypothesis that'd cause the warning: 1) Neal's proposed F-RTO fix didn't work 2) the main warning is not being triggered together with the newly-instrumented warning in undo 3) Disabling RACK stopped the warning
We couldn't figure out exactly what. So we'll do a bit code auditing first to find more suspects