> -----Original Message-----
> From: Morten Brørup <m...@smartsharesystems.com>
> Sent: Friday, July 1, 2022 9:52 PM
> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Van Haaren, Harry
> <harry.van.haa...@intel.com>; Mattias Rönnblom <hof...@lysator.liu.se>;
> mattias.ronnblom <mattias.ronnb...@ericsson.com>; dev@dpdk.org
> Cc: nd <n...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>
> Subject: RE: Service core statistics MT safety
> 
> > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > Sent: Friday, 1 July 2022 20.38
> >
> > <snip>

<snip>

> > > In my mind, any LTS/backports get the simplest/highest-confidence
> > bugfix: using
> > > atomics.
> > > The atomics are behind the "service stats" feature enable, so impact
> > is only
> > > when those are enabled.
> > >
> > > If there is still a performance hit, and there are *no* MT services
> > registered, we
> > > could check a static-global flag, and if there are no MT services use
> > the normal
> > > adds. Thoughts on such a solution to reduce atomic perf impact only
> > to apps
> > > with MT-services?
> > The choice is not between atomic stats and non-atomic stats. The stats
> > need to be incremented atomically in both cases(MT or no MT services)
> > as the reader could be another thread. We need to ensure that the
> > stores and loads are not split.
> >
> > Hence, we need atomic operations. We cannot avoid any performance hit here.
> >
> > I think the choice is between per-core stats vs global stats. IMO, we
> > should go with per-core stats to be aligned with the norm.
> +1
> 
> And the per-core counters can be summed when read for statistics, so it still 
> looks
> like one counter per service. They are only updated per-core to prevent cache
> trashing.

Yes, understood and agree. Per-core counters are preferred, as relaxed atomic 
ordered stores are enough
to ensure non-tearing loads. Summation before reporting back from the 
stats-requesting thread.
For backporting, per-core counters is a significant change. Is using atomics to 
fix the miss-behaviour a better idea?

My question below is still open, is the below enough to fix the *functional* 
part of MT services?

Code today uses normal ADD/INCQ (and hence the MT increments bug of 
tear/clobber exists)
   0x000055555649189d <+141>:   add    %rax,0x50(%rbx)
   0x00005555564918a1 <+145>:   incq   0x48(%rbx)

For x86, my disassembly for RELAXED and SEQ_CST orderings are the same, using 
LOCK prefix ("proper atomics")
   0x000055555649189d <+141>:   lock add %rax,0x50(%rbx)
   0x00005555564918a2 <+146>:   lock incq 0x48(%rbx)

My understanding is that since 2x threads can write the same address, SEQ_CST 
is required.
Can somebody more familiar with C11 confirm that?


> > > The code changes themselves are OK.. I can send a patch with fix if
> > there's
> > > agreement on the approach?
> > >
> > >
> > > diff --git a/lib/eal/common/rte_service.c
> > b/lib/eal/common/rte_service.c index
> > > ef31b1f63c..a07c8fc2d7 100644
> > > --- a/lib/eal/common/rte_service.c
> > > +++ b/lib/eal/common/rte_service.c
> > > @@ -363,9 +363,9 @@ service_runner_do_callback(struct
> > > rte_service_spec_impl *s,
> > >                 uint64_t start = rte_rdtsc();
> > >                 s->spec.callback(userdata);
> > >                 uint64_t end = rte_rdtsc();
> > > -               s->cycles_spent += end - start;
> > > +               __atomic_fetch_add(&s->cycles_spent, (end-start),
> > > __ATOMIC_RELAXED);
> > > +               __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED);
> > >                 cs->calls_per_service[service_idx]++;
> > > -               s->calls++;
> > >         } else
> > >                 s->spec.callback(userdata);  }

Reply via email to