> -----Original Message----- > From: Morten Brørup <m...@smartsharesystems.com> > Sent: Friday, July 1, 2022 9:52 PM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Van Haaren, Harry > <harry.van.haa...@intel.com>; Mattias Rönnblom <hof...@lysator.liu.se>; > mattias.ronnblom <mattias.ronnb...@ericsson.com>; dev@dpdk.org > Cc: nd <n...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: Service core statistics MT safety > > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > Sent: Friday, 1 July 2022 20.38 > > > > <snip>
<snip> > > > In my mind, any LTS/backports get the simplest/highest-confidence > > bugfix: using > > > atomics. > > > The atomics are behind the "service stats" feature enable, so impact > > is only > > > when those are enabled. > > > > > > If there is still a performance hit, and there are *no* MT services > > registered, we > > > could check a static-global flag, and if there are no MT services use > > the normal > > > adds. Thoughts on such a solution to reduce atomic perf impact only > > to apps > > > with MT-services? > > The choice is not between atomic stats and non-atomic stats. The stats > > need to be incremented atomically in both cases(MT or no MT services) > > as the reader could be another thread. We need to ensure that the > > stores and loads are not split. > > > > Hence, we need atomic operations. We cannot avoid any performance hit here. > > > > I think the choice is between per-core stats vs global stats. IMO, we > > should go with per-core stats to be aligned with the norm. > +1 > > And the per-core counters can be summed when read for statistics, so it still > looks > like one counter per service. They are only updated per-core to prevent cache > trashing. Yes, understood and agree. Per-core counters are preferred, as relaxed atomic ordered stores are enough to ensure non-tearing loads. Summation before reporting back from the stats-requesting thread. For backporting, per-core counters is a significant change. Is using atomics to fix the miss-behaviour a better idea? My question below is still open, is the below enough to fix the *functional* part of MT services? Code today uses normal ADD/INCQ (and hence the MT increments bug of tear/clobber exists) 0x000055555649189d <+141>: add %rax,0x50(%rbx) 0x00005555564918a1 <+145>: incq 0x48(%rbx) For x86, my disassembly for RELAXED and SEQ_CST orderings are the same, using LOCK prefix ("proper atomics") 0x000055555649189d <+141>: lock add %rax,0x50(%rbx) 0x00005555564918a2 <+146>: lock incq 0x48(%rbx) My understanding is that since 2x threads can write the same address, SEQ_CST is required. Can somebody more familiar with C11 confirm that? > > > The code changes themselves are OK.. I can send a patch with fix if > > there's > > > agreement on the approach? > > > > > > > > > diff --git a/lib/eal/common/rte_service.c > > b/lib/eal/common/rte_service.c index > > > ef31b1f63c..a07c8fc2d7 100644 > > > --- a/lib/eal/common/rte_service.c > > > +++ b/lib/eal/common/rte_service.c > > > @@ -363,9 +363,9 @@ service_runner_do_callback(struct > > > rte_service_spec_impl *s, > > > uint64_t start = rte_rdtsc(); > > > s->spec.callback(userdata); > > > uint64_t end = rte_rdtsc(); > > > - s->cycles_spent += end - start; > > > + __atomic_fetch_add(&s->cycles_spent, (end-start), > > > __ATOMIC_RELAXED); > > > + __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED); > > > cs->calls_per_service[service_idx]++; > > > - s->calls++; > > > } else > > > s->spec.callback(userdata); }