> On 10/6/2020 11:13 AM, Ananyev, Konstantin wrote: > >>> -----Original Message----- > >>> From: Jerin Jacob <jerinjac...@gmail.com> > >>> Sent: Monday, October 5, 2020 5:35 PM > >>> To: Nicolau, Radu <radu.nico...@intel.com> > >>> Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Richardson, Bruce > >>> <bruce.richard...@intel.com>; Ananyev, Konstantin > >>> <konstantin.anan...@intel.com>; Van Haaren, Harry > >>> <harry.van.haa...@intel.com>; dev@dpdk.org; jer...@marvell.com; nd > >>> <n...@arm.com> > >>> Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements > >>> > >>> On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nico...@intel.com> > >>> wrote: > >>>> > >>>> On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote: > >>>>> <snip> > >>>>>>> Add minimum burst throughout the scheduler pipeline and a flush > >>>>>>> counter. > >>>>>>> Replace ring API calls with local single threaded implementation where > >>>>>>> possible. > >>>>>>> > >>>>>>> Signed-off-by: Radu Nicolau mailto:radu.nico...@intel.com > >>>>>>> > >>>>>>> Thanks for the patch, a few comments inline. > >>>>>>> > >>>>>>> Why not make these APIs part of the rte_ring library? You could > >>>>>>> further > >>>>>> optimize them by keeping the indices on the same cacheline. > >>>>>>> I'm not sure there is any need for non thread-safe rings outside this > >>>>>> particular case. > >>>>>>> [Honnappa] I think if we add the APIs, we will find the use cases. > >>>>>>> But, more than that, I understand that rte_ring structure is exposed > >>>>>>> to the > >>>>>> application. The reason for doing that is the inline functions that > >>>>>> rte_ring > >>>>>> provides. IMO, we should still maintain modularity and should not use > >>>>>> the > >>>>>> internals of the rte_ring structure outside of the library. > >>>>>>> +1 to that. > >>>>>>> > >>>>>>> BTW, is there any real perf benefit from such micor-optimisation? > >>>>>> I'd tend to view these as use-case specific, and I'm not sure we > >>>>>> should clutter > >>>>>> up the ring library with yet more functions, especially since they > >>>>>> can't be > >>>>>> mixed with the existing enqueue/dequeue functions, since they don't use > >>>>>> the head pointers. > >>>>> IMO, the ring library is pretty organized with the recent addition of > >>>>> HTS/RTS > >>> modes. This can be one of the modes and should allow us to use the > >>> existing > >>> functions (though additional functions are required as well). > >>>>> The other concern I have is, this implementation can be further > >>>>> optimized by > >>> using a single cache line for the pointers. It uses 2 cache lines just > >>> because of the > >>> layout of the rte_ring structure. > >>>>> There was a question earlier about the performance improvements of this > >>> patch? Are there any % performance improvements that can be shared? > >>>>> It is also possible to change the above functions to use the head/tail > >>>>> pointers > >>> from producer or the consumer cache line alone to check for perf > >>> differences. > >>>> I don't have a % for the final improvement for this change alone, but > >>>> there was some improvement in the memory overhead measurable during > >>>> development, which very likely resulted in the whole optimization having > >>>> more headroom. > >>>> > >>>> I agree that this may be further optimized, maybe by having a local > >>>> implementation of a ring-like container instead. > >>> Have we decided on the next steps for this patch? Is the plan to > >>> supersede this patch and have different > >>> one in rte_ring subsystem, > >> My preference is to merge this version of the patch; > >> 1) The ring helper functions are stripped to the SW PMD usage, and not > >> valid to use in the general. > >> 2) Adding static inline APIs in an LTS without extensive doesn't seem a > >> good idea. > >> > >> If Honnappa is OK with the above solution for 20.11, we can see about > >> moving the rings part of the > >> code to rte_ring library location in 21.02, and give ourselves some time > >> to settle the usage/API before > >> the next LTS. > >> > > As ring library maintainer I share Honnappa concern that another library > > not uses public ring API, > > but instead accesses ring internals directly. Obviously such coding > > practice is not welcomed > > as it makes harder to maintain/extend ring library in future. > > About 2) - these new API can(/shoud) be marked an experimental anyway. > > As another thing - it is still unclear what a performance gain we are > > talking about here. > > Is it really worth it comparing to just using SP/SC? > > The change itself came after I analyzed the memory bound sections of the > code, and I just did a quick test, I got about 3.5% improvement in > throughput, maybe not so much but significant for such a small change, > and depending on the usecase it may be more. > > As for the implementation itself, I would favour having a custom ring > like container in the PMD code, this will solve the issue of using > rte_ring internals while still allow for full optimisation. If this is > acceptable, I will follow up by tomorrow.
Sounds ok to me. Thanks Konstantin