Hey All,

(I've been away on Holidays a few days, just catching up!)

> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Sent: Wednesday, June 29, 2022 9:07 PM
> To: Mattias Rönnblom <hof...@lysator.liu.se>; mattias.ronnblom
> <mattias.ronnb...@ericsson.com>; Morten Brørup
> <m...@smartsharesystems.com>; dev@dpdk.org
> Cc: Van Haaren, Harry <harry.van.haa...@intel.com>; nd <n...@arm.com>; nd
> <n...@arm.com>
> Subject: RE: Service core statistics MT safety
> 
> <snip>

<big snip of previous discussions>

> > At the time of the read operation (in the global counter solution), there 
> > may well
> > be cycles consumed or calls having been made, but not yet posted. The window
> > between call having been made, and global counter having been incremented
> > (and thus made globally visible) is small, but non-zero.
> Agree. The read value is the atomic state of the system at a given instance 
> (when the
> read was executed), though that instance happened few cycles back.
> (Just to be clear, I am fine with per-core counters)

Option 1: "Per core counters"

> Agree we need atomic operations. I am not sure if __atomic_fetch_add or
> __atomic_store_n would have a large difference. __atomic_fetch_add would 
> result
> in less number of instructions. I am fine with either.

Option 2: "Use atomics for counter increments".

> > >> I was fortunate to get some data from a real-world application, and
> > >> enabling service core stats resulted in a 7% degradation of overall
> > >> system capacity. I'm guessing atomic instructions would not make things
> > better.

Agree, performance of atomics is likely to reduce performance.. but correctness
is worth more than performance.

<snip>

In my mind, any LTS/backports get the simplest/highest-confidence bugfix: using 
atomics.
The atomics are behind the "service stats" feature enable, so impact is only 
when those are
enabled.

If there is still a performance hit, and there are *no* MT services registered, 
we could check
a static-global flag, and if there are no MT services use the normal adds. 
Thoughts on such a
solution to reduce atomic perf impact only to apps with MT-services? 

The code changes themselves are OK.. I can send a patch with fix if there's 
agreement on the approach?


diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index ef31b1f63c..a07c8fc2d7 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -363,9 +363,9 @@ service_runner_do_callback(struct rte_service_spec_impl *s,
                uint64_t start = rte_rdtsc();
                s->spec.callback(userdata);
                uint64_t end = rte_rdtsc();
-               s->cycles_spent += end - start;
+               __atomic_fetch_add(&s->cycles_spent, (end-start), 
__ATOMIC_RELAXED);
+               __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED);
                cs->calls_per_service[service_idx]++;
-               s->calls++;
        } else
                s->spec.callback(userdata);
 }

Reply via email to