On 7/15/25 4:49 PM, Chia-Yu Chang (Nokia) wrote:
> On 7/4/25 10:53 AM, [email protected] wrote:
> [...]
>>> +}
>>> +
>>> +/* Handles AccECN option ECT and CE 24-bit byte counters update into
>>> + * the u32 value in tcp_sock. As we're processing TCP options, it is
>>> + * safe to access from - 1.
>>> + */
>>> +static inline s32 tcp_update_ecn_bytes(u32 *cnt, const char *from,
>>> + u32 init_offset) {
>>> + u32 truncated = (get_unaligned_be32(from - 1) - init_offset) &
>>> + 0xFFFFFFU;
>>> + u32 delta = (truncated - *cnt) & 0xFFFFFFU;
>>> +
>>> + /* If delta has the highest bit set (24th bit) indicating
>>> + * negative, sign extend to correct an estimation using
>>> + * sign_extend32(delta, 24 - 1)
>>> + */
>>> + delta = sign_extend32(delta, 23);
>>
>> I'm under the impression that delta could be simply:
>>
>> delta = (truncated - *cnt)
>>
>> What am I missing?
>
> Hi Paolo,
>
> I think this code is necessary to ensure delta will not a super large value
> in case of wrap adound.
>
> For instance, if truncated = 0x0000001F and *cnt = 0x00FFFFFF, then
> (truncated - *cnt) = 0xFF000020
>
> But sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0x00000020, which
> shall be corrrect.
>
> Another example, if truncated = 0x0000001F and *cnt = 0x0000003F, then
> (truncated - *cnt) = 0xFFFFFFE0
>
> And sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0xFFFFFFE0.
>
> In this latter example, both are correct.
Ok, I missed the fact that *cnt is a 24 bit integer, too. Your code
looks good.
>
> [...]
>>> a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
>>> d98a1a17eb52..2169fd28594e 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -385,6 +385,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock
>>> *tp)
>>> #define OPTION_SMC BIT(9)
>>> #define OPTION_MPTCP BIT(10)
>>> #define OPTION_AO BIT(11)
>>> +#define OPTION_ACCECN BIT(12)
>>>
>>> static void smc_options_write(__be32 *ptr, u16 *options) { @@ -406,6
>>> +407,8 @@ struct tcp_out_options {
>>> u16 mss; /* 0 to disable */
>>> u8 ws; /* window scale, 0 to disable */
>>> u8 num_sack_blocks; /* number of SACK blocks to include */
>>> + u8 num_accecn_fields:7, /* number of AccECN fields needed */
>>> + use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */
>>> u8 hash_size; /* bytes in hash_location */
>>> u8 bpf_opt_len; /* length of BPF hdr option */
>>> __u8 *hash_location; /* temporary pointer, overloaded */
>>> @@ -621,6 +624,8 @@ static void tcp_options_write(struct tcphdr *th, struct
>>> tcp_sock *tp,
>>> struct tcp_out_options *opts,
>>> struct tcp_key *key) {
>>> + u8 leftover_highbyte = TCPOPT_NOP; /* replace 1st NOP if avail */
>>> + u8 leftover_lowbyte = TCPOPT_NOP; /* replace 2nd NOP in
>>> + succession */
>>> __be32 *ptr = (__be32 *)(th + 1);
>>> u16 options = opts->options; /* mungable copy */
>>>
>>> @@ -656,15 +661,79 @@ static void tcp_options_write(struct tcphdr *th,
>>> struct tcp_sock *tp,
>>> *ptr++ = htonl(opts->tsecr);
>>> }
>>>
>>> + if (OPTION_ACCECN & options) {
>>> + /* Initial values for AccECN option, ordered is based on ECN
>>> field bits
>>> + * similar to received_ecn_bytes. Used for SYN/ACK AccECN
>>> option.
>>> + */
>>> + static u32 synack_ecn_bytes[3] = { 0, 0, 0 };
>>
>> I think this does not address Eric's concern on v9 WRT global variable, as
>> every CPU will still touch the same memory while accessing the above array.
>>
>>> + const u8 ect0_idx = INET_ECN_ECT_0 - 1;
>>> + const u8 ect1_idx = INET_ECN_ECT_1 - 1;
>>> + const u8 ce_idx = INET_ECN_CE - 1;
>>> + u32 e0b;
>>> + u32 e1b;
>>> + u32 ceb;
>>> + u8 len;
>>> +
>>> + if (opts->use_synack_ecn_bytes) {
>>> + e0b = synack_ecn_bytes[ect0_idx] +
>>> TCP_ACCECN_E0B_INIT_OFFSET;
>>> + e1b = synack_ecn_bytes[ect1_idx] +
>>> TCP_ACCECN_E1B_INIT_OFFSET;
>>> + ceb = synack_ecn_bytes[ce_idx] +
>>> + TCP_ACCECN_CEB_INIT_OFFSET;
>>
>> On the flip side I don't see such array modified here, not in later
>> patches?!? If so you could make it const and a global variable would be ok.
>
> Sure, I will make it as static const global variable, which I hope this is ok
> for you.
>
>
>>> +/* Calculates how long AccECN option will fit to @remaining option space.
>>> + *
>>> + * AccECN option can sometimes replace NOPs used for alignment of
>>> +other
>>> + * TCP options (up to @max_combine_saving available).
>>> + *
>>> + * Only solutions with at least @required AccECN fields are accepted.
>>> + *
>>> + * Returns: The size of the AccECN option excluding space repurposed
>>> +from
>>> + * the alignment of the other options.
>>> + */
>>> +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int
>>> required,
>>> + int remaining) {
>>> + int size = TCP_ACCECN_MAXSIZE;
>>> + int max_combine_saving;
>>> +
>>> + if (opts->use_synack_ecn_bytes)
>>> + max_combine_saving = tcp_synack_options_combine_saving(opts);
>>> + else
>>> + max_combine_saving = opts->num_sack_blocks > 0 ? 2 : 0;
>>> + opts->num_accecn_fields = TCP_ACCECN_NUMFIELDS;
>>> + while (opts->num_accecn_fields >= required) {
>>> + int leftover_size = size & 0x3;
>>> + /* Pad to dword if cannot combine */
>>> + if (leftover_size > max_combine_saving)
>>> + leftover_size = -((4 - leftover_size) & 0x3);
>>
>> I *think* that with the above you mean something alike:
>>
>> size = ALIGN(size, 4);
>> leftover_size = 0
>>
>> ?
>>
>> The used code looks quite obscure to me.
>>
>> /P
>
> Indeed, I will make below changes in the next version by using ALIGN() and
> ALIGN_DOWN()
>
> Here the aim is to pad up (if max_combine_saving is not enough) or trim down
> (if max_combine saving is enough) to DWORD.
>
> And the final return size will be the the a multiple of DWORD.
>
> Would it be more readable?
>
> /* Pad to DWORD if cannot combine. Align_size represents
> * the final size to be used by AccECN options.
> * +======+=============+====================+============+
> * | size | size exceed | max_combine_saving | align_size |
> * | | DWORD | | |
> * +======+=============+====================+============+
> * | 2 | 2 | < 2 | 4 |
> * | 2 | 2 | >=2 | 0 |
> * | 5 | 1 | < 1 | 8 |
> * | 5 | 1 | >=1 | 4 |
> * | 8 | 0 | Any | 8 |
> * | 11 | 3 | < 3 | 12 |
> * | 11 | 3 | >=3 | 8 |
> * +======+=============+====================+============+
> */
> if ((size & 0x3) > max_combine_saving)
> align_size = ALIGN(size, 4);
> else
> align_size = ALIGN_DOWN(size, 4);
>
> if (remaining >= align_size) {
> size = align_size;
> break;
> }
Yes, IMHO is more readable. No need to add the table, the original
comment is clear enough.
Thanks,
Paolo