Hey All, (I've been away on Holidays a few days, just catching up!)
> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Wednesday, June 29, 2022 9:07 PM > To: Mattias Rönnblom <hof...@lysator.liu.se>; mattias.ronnblom > <mattias.ronnb...@ericsson.com>; Morten Brørup > <m...@smartsharesystems.com>; dev@dpdk.org > Cc: Van Haaren, Harry <harry.van.haa...@intel.com>; nd <n...@arm.com>; nd > <n...@arm.com> > Subject: RE: Service core statistics MT safety > > <snip> <big snip of previous discussions> > > At the time of the read operation (in the global counter solution), there > > may well > > be cycles consumed or calls having been made, but not yet posted. The window > > between call having been made, and global counter having been incremented > > (and thus made globally visible) is small, but non-zero. > Agree. The read value is the atomic state of the system at a given instance > (when the > read was executed), though that instance happened few cycles back. > (Just to be clear, I am fine with per-core counters) Option 1: "Per core counters" > Agree we need atomic operations. I am not sure if __atomic_fetch_add or > __atomic_store_n would have a large difference. __atomic_fetch_add would > result > in less number of instructions. I am fine with either. Option 2: "Use atomics for counter increments". > > >> I was fortunate to get some data from a real-world application, and > > >> enabling service core stats resulted in a 7% degradation of overall > > >> system capacity. I'm guessing atomic instructions would not make things > > better. Agree, performance of atomics is likely to reduce performance.. but correctness is worth more than performance. <snip> In my mind, any LTS/backports get the simplest/highest-confidence bugfix: using atomics. The atomics are behind the "service stats" feature enable, so impact is only when those are enabled. If there is still a performance hit, and there are *no* MT services registered, we could check a static-global flag, and if there are no MT services use the normal adds. Thoughts on such a solution to reduce atomic perf impact only to apps with MT-services? The code changes themselves are OK.. I can send a patch with fix if there's agreement on the approach? diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c index ef31b1f63c..a07c8fc2d7 100644 --- a/lib/eal/common/rte_service.c +++ b/lib/eal/common/rte_service.c @@ -363,9 +363,9 @@ service_runner_do_callback(struct rte_service_spec_impl *s, uint64_t start = rte_rdtsc(); s->spec.callback(userdata); uint64_t end = rte_rdtsc(); - s->cycles_spent += end - start; + __atomic_fetch_add(&s->cycles_spent, (end-start), __ATOMIC_RELAXED); + __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED); cs->calls_per_service[service_idx]++; - s->calls++; } else s->spec.callback(userdata); }