Hi John,

No, we don't think so. The symptoms fixed by this are markedly different when 
performing error-injection to exercise these codepaths, than what we have 
observed with additional logging from the systems reported in bug 264257...

Here, we fix an issue when a data segment + FIN is retransmitted multiple 
times, the left edge of the segment moves right (leaving a gap, which the 
receiver would have to request again; or in the absence of SACK, make no 
further progress until a full timeout occurs). Certainly a nuisance and 
incorrect behavior, but unlikely to be the actual root cause of bug264257... 

Michael is currently improving TCP blackbox logging in the base stack, and 
providing this to the people affected, to find out why the TCPCB variables 
become erraneous.

This because even while we have extracted effectively full packet captures 
(only lacking proper timing information) in 3 instances, the problem can not be 
recreated yet.

From prior logging we know, that on (very) busy servers, these state variables 
become incorrect much most frequently that expected - but typically without any 
ill effects.

Conceptually, the base TCP stack can end up in a state, where there are 
multiple FIN bits - each with distinct sequence numbers - get sent after the 
conclusion of sending actual data in the session: <SYN>[data]<FIN><FIN><FIN>

(I've seen one logged instance, where 6 consecutive <FIN>s appear to have been 
transmitted).

Bug264257 is really due to the new "SACK rescue retransmission" feature, which 
was made active in 13.1, exposing these preexisting, unexpected behavior.

Note that other Stacks (e.g. RACK stack) is not affected by this at all, as 
there it is made sure that all outstanding data is ACKed by the receiver, prior 
to sending out the <FIN>.  Also, data + FIN segments are not sent by the RACK 
stack.


Best regards,
   Richard


-----Original Message-----
From: John Baldwin <[email protected]> 
Sent: Freitag, 15. Juli 2022 19:51
To: Richard Scheffenegger <[email protected]>; [email protected]; 
[email protected]; [email protected]
Subject: Re: git: 66605ff791b1 - main - tcp: Undo the increase in sequence 
number by 1 due to the FIN flag in case of a transient error.

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




On 7/15/22 9:36 AM, Richard Scheffenegger wrote:
> The branch main has been updated by rscheff:
>
> URL: 
> https://cgit.FreeBSD.org/src/commit/?id=66605ff791b12a2c3bb4570379db0e14d29fca4c
>
> commit 66605ff791b12a2c3bb4570379db0e14d29fca4c
> Author:     Richard Scheffenegger <[email protected]>
> AuthorDate: 2022-07-14 00:49:10 +0000
> Commit:     Richard Scheffenegger <[email protected]>
> CommitDate: 2022-07-14 01:18:19 +0000
>
>      tcp: Undo the increase in sequence number by 1 due to the FIN flag in 
> case of a transient error.
>
>      If an error occurs while processing a TCP segment with some data and the 
> FIN
>      flag, the back out of the sequence number advance does not take into 
> account the
>      increase by 1 due to the FIN flag.
>
>      Reviewed By: jch, gnn, #transport, tuexen
>      Sponsored by: NetApp, Inc.
>      Differential Revision: https://reviews.freebsd.org/D2970

Is this the source of bug 264257?

--
John Baldwin

Reply via email to