Hi Stephen,
Thank you for the feedback.
The main issue I wanted to address is the race condition when the flow stops.
With the old memset code, if a reset happens during a stats read/modify/write 
cycle (even with +0 modification), the zeroing can be completely lost.
My new code handles this case better under no-traffic conditions.
We do not need fully reliable statistics, so we don't need to use atomic 
operations.
This seems like a reasonable compromise for our use case.
------------------------------------------------------------------
发件人:Stephen Hemminger <[email protected]>
发送时间:2025年11月27日(周四) 08:07
收件人:Dimon<[email protected]>
抄 送:dev<[email protected]>; Kyo Liu<[email protected]>; 
Leon<[email protected]>; Sam<[email protected]>
主 题:Re: [PATCH v1 1/1] net/nbl: fix Rx/Tx stats concurrency
On Tue, 25 Nov 2025 18:54:36 -0800
Dimon Zhao <[email protected]> wrote:
> Queue statistics are being continuously updated in Rx/Tx burst
> routines while handling traffic. In addition to that, statistics
> can be reset (written with zeroes) on statistics reset in other
> threads, causing a race condition, which in turn could result in
> wrong stats.
> 
> The patch provides an approach with reference values, allowing
> the actual counters to be writable within Rx/Tx burst threads
> only, and updating reference values on stats reset.
> 
> Fixes: 661c0ccf2512 ("net/nbl: support statistics")
> 
> Signed-off-by: Dimon Zhao <[email protected]>
First off, many drivers do the same thing as the current code.
I think virtio is the most commonly used driver with same memset.
They just zero an accumulated buffer.
The SW counters are not meant to super exact (not a good idea to bill
customers based on packet counts).
Your new method using a stashed old copy, still has races.
The old code would race the zero with read/modify/write of the increment.
If the race happened the zero might happen in the middle of the modify
causing the value not to be zeroed.
The new code is less of a problem but assignment is not atomic on
many platforms, especially for structure size objects. Therefore it
could happen to get read of stale data.
If you really want to have reliable statistics in SW, you
would have to use atomic operations, and pay the penalty of the
additional locked operations in the fast path.
PS: FreeBSD has the same problem in many drivers.

Reply via email to