<snip> > >>>>>>>>>> > >>>>>>>>>>> From: Mattias Rönnblom > >>>>>>>>>>> [mailto:mattias.ronnb...@ericsson.com] > >>>>>>>>>>> Sent: Monday, 27 June 2022 13.06 > >>>>>>>>>>> > >>>>>>>>>>> Hi. > >>>>>>>>>>> > >>>>>>>>>>> Is it safe to enable stats on MT safe services? > >>>>>>>>>>> > >>>>>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122- > >>>> 313273af > >>>>>>>>>>> - > >>>>>> 4 > >>>>>>>>>>> 54445555731-6096fdb16385f88f&q=1&e=27b94605-d1e2- > 40b6- > >>>> af6d- > >>>>>> 9ebc54d > >>>>>>>>>>> > >>>>>> > >>>> > >> > 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain% > >>>>>> 2Flib > >>>>>>>>>>> %2Feal%2Fcommon%2Frte_service.c%23 > >>>>>>>>>>> L3 > >>>>>>>>>>> 6 > >>>>>>>>>>> 6 > >>>>>>>>>>> > >>>>>>>>>>> It seems to me this would have to be an __atomic_add for > >>>>>>>>>>> this code to produce deterministic results. > >>>>>>>>>> > >>>>>>>>>> I agree. The same goes for the 'calls' field. > >>>>>>>>> The calling function does the locking. > >>>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122- > >> 313273af > >>>>>>>>> - > >>>>>> 454 > >>>>>>>>> 445555731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6- > af6d- > >>>>>> 9ebc54d5db1 > >>>>>>>>> > >>>>>> > >>>> > >> > 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib > >>>>>> %2Feal > >>>>>>>>> %2Fcommon%2Frte_service.c%23L3 > >>>>>>>>> 98 > >>>>>>>>> > >>>>>>>>> For more information you can look at: > >>>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122- > >> 313273af > >>>>>>>>> - > >>>>>> 454 > >>>>>>>>> 445555731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6- > >> af6d- > >>>>>> 9ebc54d5db1 > >>>>>>>>> > >>>>>> > >>>> > >> > 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib > >>>>>> %2Feal > >>>>>>>>> %2Finclude%2Frte_service.h%23L > >>>>>>>>> 120 > >>>>>>>>> > >>>>>>>> > >>>>>>>> What about the > >>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122- > >> 313273af- > >>>>>> 4544 > >>>>>>>> 45555731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6- > af6d- > >>>>>> 9ebc54d5db18& > >>>>>>>> > >>>>>> > >>>> > >> > u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2 > >>>>>> Feal%2F > >>>>>>>> common%2Frte_service.c%23L404 > >>>>>>>> call (for MT safe services)? > >>>>>>>> > >>>>>>>> There's no lock held there. > >>>>>>> Good point. > >>>>>>> This is the case where the service running in service cores is > >>>>>>> MT safe. However, > >>>>>> the stats are incremented outside of the MT Safety mechanism > >>>>>> employed by the service. So, yes, this and other updates in the > >>>>>> function 'service_runner_do_callback' need to be updated atomically. > >>>>>> > >>>>>> Maybe a better solution would be to move this to the core_state > >>>>>> struct (and eliminate the "calls" field since the same > >>>>>> information is already in the core_state struct). The contention > >>>>>> on these cache lines will be pretty crazy for services with short > >>>>>> run time (say a thousand cycles or less per call), assuming they > >>>>>> are mapped to many > >> cores. > >>>>> That's one option, the structures are internal as well. With this > >>>>> option stats > >>>> need to be aggregated which will not give an accurate view. But, > >>>> that is the nature of the statistics. > >>>>> > >>>> > >>>> Per-core counters is a very common pattern. Used for Linux MIB > >>>> counters, for example. I'm not sure I think it's much less accurate. > >>>> I mean, you just load in quick succession what's globally visible > >>>> for the different per-lcore counters. If you do a relaxed store on > >>>> an ARM, how long time does it take until it's seen by someone doing > >>>> a relaxed > >> load on a different core? Roughly. > >>> I think my explanation of the problem is not clear. > >>> > >>> If a service is running on more than one core and the stats are per > >>> core, when > >> we aggregate, the resulting statistics is not atomic. By making the > >> stats per core, we will be taking out that feature which is present > >> currently (even though it is implemented incorrectly). As we agree, > >> the proposed change is a common pattern and taking away the atomicity of > the stats might not be a problem. > >>> > >> > >> Isn't it just a push model, versus a pull one? Both give just an > >> approximation, albeit a very good one, of how many cycles are spent > >> "now" for a particular service. Isn't time a local phenomena in a SMP > >> system, and there is no global "now"? Maybe you can achieve such with > >> a transaction or handshake of some sort, but I don't see how the an > __atomic_add would be enough. > > If we consider a global time line of events, using atomic operation will > > provide > a single 'now' from the reader's perspective (of course there might be writers > waiting to update). Without the atomic operations, there will not be a single > 'now' from reader's perspective, there will be multiple read events on the > timeline. > > > > At the time of the read operation (in the global counter solution), there may > well > be cycles consumed or calls having been made, but not yet posted. The window > between call having been made, and global counter having been incremented > (and thus made globally visible) is small, but non-zero. Agree. The read value is the atomic state of the system at a given instance (when the read was executed), though that instance happened few cycles back. (Just to be clear, I am fine with per-core counters)
> > (The core-local counter solution also use atomic operations, although not > __atomic_add, but store, for the producer, and load, for the consumer.) Agree we need atomic operations. I am not sure if __atomic_fetch_add or __atomic_store_n would have a large difference. __atomic_fetch_add would result in less number of instructions. I am fine with either. > > >> > >> I was fortunate to get some data from a real-world application, and > >> enabling service core stats resulted in a 7% degradation of overall > >> system capacity. I'm guessing atomic instructions would not make things > better. > > Is the service running on multiple cores? > > > > Yes. I think something like 6 cores were used in this case. The effect will > grow > with core count, obviously. On a large system, I don't think you will do much > else but fight for this cache line. > > If you want post counter updates to some shared data structure, you need to > batch the updates to achieve reasonable efficiency. That will be, obviously, > at > the cost of accuracy, since there will be a significant delay between local- > counter-increment, and post-in-global-data-structure. The system will be much > less able to answer how many cycles have been consumed at a particular point > in time. > > For really large counter sets, the above, batched-update approach may be > required. You simply can't afford the memory required to duplicate the counter > struct across all cores in the system. In my experience, this still can be > made to > meet real-world counter accuracy requirement. > (Accuracy in the time dimension.) > > >> > >>>> > >>>>> I am also wondering if these stats are of any use other than for > debugging. > >>>> Adding a capability to disable stats might help as well. > >>>>> > >>>> > >>>> They could be used as a crude tool to determine service core utilization. > >>>> Comparing utilization between different services running on the same core > >>>> should be straight-forward, but lcore utilization is harder in absolute > terms. If > >>>> you just look at "cycles", a completely idle core would look like it's > >>>> very > busy > >>>> (basically rdtsc latency added for every loop). I assume you'd have to do > >> some > >>>> heuristic based on both "calls" and "cycles" to get an estimate. > >>>> > >>>> I think service core utilization would be very useful, although that > >>>> would > >> require > >>>> some changes in the service function signature, so the service can report > >> back if > >>>> it did some useful work for a particular call. > >>>> > >>>> That would make for a DPDK 'top'. Just like 'top', it can't impose any > serious > >>>> performance degradation when used, to be really useful, I think. > >>>> > >>>> Sure, it should be possible to turn it on and off. I thought that was > >>>> the case > >>>> already? > >>> Thanks, yes, this exists already. Though the 'loops' counter is out of > >>> the stats > >> enable check, looks like it is considered as an attribute for some reason. > >>> > >>>> > >>>>>> > >>>>>> Idle service cores will basically do nothing else than stall waiting > >>>>>> for these lines, I suspect, hampering the progress of more busy cores. > >>> > >