Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements

Ananyev, Konstantin Wed, 07 Oct 2020 03:53:18 -0700

> On 10/6/2020 11:13 AM, Ananyev, Konstantin wrote:
> >>> -----Original Message-----
> >>> From: Jerin Jacob <[email protected]>
> >>> Sent: Monday, October 5, 2020 5:35 PM
> >>> To: Nicolau, Radu <[email protected]>
> >>> Cc: Honnappa Nagarahalli <[email protected]>; Richardson, Bruce
> >>> <[email protected]>; Ananyev, Konstantin
> >>> <[email protected]>; Van Haaren, Harry
> >>> <[email protected]>; [email protected]; [email protected]; nd
> >>> <[email protected]>
> >>> Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
> >>>
> >>> On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <[email protected]> 
> >>> wrote:
> >>>>
> >>>> On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote:
> >>>>> <snip>
> >>>>>>> Add minimum burst throughout the scheduler pipeline and a flush 
> >>>>>>> counter.
> >>>>>>> Replace ring API calls with local single threaded implementation where
> >>>>>>> possible.
> >>>>>>>
> >>>>>>> Signed-off-by: Radu Nicolau mailto:[email protected]
> >>>>>>>
> >>>>>>> Thanks for the patch, a few comments inline.
> >>>>>>>
> >>>>>>> Why not make these APIs part of the rte_ring library? You could 
> >>>>>>> further
> >>>>>> optimize them by keeping the indices on the same cacheline.
> >>>>>>> I'm not sure there is any need for non thread-safe rings outside this
> >>>>>> particular case.
> >>>>>>> [Honnappa] I think if we add the APIs, we will find the use cases.
> >>>>>>> But, more than that, I understand that rte_ring structure is exposed 
> >>>>>>> to the
> >>>>>> application. The reason for doing that is the inline functions that 
> >>>>>> rte_ring
> >>>>>> provides. IMO, we should still maintain modularity and should not use 
> >>>>>> the
> >>>>>> internals of the rte_ring structure outside of the library.
> >>>>>>> +1 to that.
> >>>>>>>
> >>>>>>> BTW, is there any real perf benefit from such micor-optimisation?
> >>>>>> I'd tend to view these as use-case specific, and I'm not sure we 
> >>>>>> should clutter
> >>>>>> up the ring library with yet more functions, especially since they 
> >>>>>> can't be
> >>>>>> mixed with the existing enqueue/dequeue functions, since they don't use
> >>>>>> the head pointers.
> >>>>> IMO, the ring library is pretty organized with the recent addition of 
> >>>>> HTS/RTS
> >>> modes. This can be one of the modes and should allow us to use the 
> >>> existing
> >>> functions (though additional functions are required as well).
> >>>>> The other concern I have is, this implementation can be further 
> >>>>> optimized by
> >>> using a single cache line for the pointers. It uses 2 cache lines just 
> >>> because of the
> >>> layout of the rte_ring structure.
> >>>>> There was a question earlier about the performance improvements of this
> >>> patch? Are there any % performance improvements that can be shared?
> >>>>> It is also possible to change the above functions to use the head/tail 
> >>>>> pointers
> >>> from producer or the consumer cache line alone to check for perf 
> >>> differences.
> >>>> I don't have a % for the final improvement for this change alone, but
> >>>> there was some improvement in the memory overhead measurable during
> >>>> development, which very likely resulted in the whole optimization having
> >>>> more headroom.
> >>>>
> >>>> I agree that this may be further optimized, maybe by having a local
> >>>> implementation of a ring-like container instead.
> >>> Have we decided on the next steps for this patch? Is the plan to
> >>> supersede this patch and have different
> >>> one in rte_ring subsystem,
> >> My preference is to merge this version of the patch;
> >> 1) The ring helper functions are stripped to the SW PMD usage, and not 
> >> valid to use in the general.
> >> 2) Adding static inline APIs in an LTS without extensive doesn't seem a 
> >> good idea.
> >>
> >> If Honnappa is OK with the above solution for 20.11, we can see about 
> >> moving the rings part of the
> >> code to rte_ring library location in 21.02, and give ourselves some time 
> >> to settle the usage/API before
> >> the next LTS.
> >>
> > As ring library maintainer I share Honnappa concern that another library 
> > not uses public ring API,
> > but instead accesses ring internals directly. Obviously such coding 
> > practice is not welcomed
> > as it makes harder to maintain/extend ring library in future.
> > About 2) - these new API can(/shoud) be marked an experimental anyway.
> > As another thing - it is still unclear what a performance gain we are 
> > talking about here.
> > Is it really worth it comparing to just using SP/SC?
> 
> The change itself came after I analyzed the memory bound sections of the
> code, and I just did a quick test, I got about 3.5% improvement in
> throughput,  maybe not so much but significant for such a small change,
> and depending on the usecase it may be more.
> 
> As for the implementation itself, I would favour having a custom ring
> like container in the PMD code, this will solve the issue of using
> rte_ring internals while still allow for full optimisation. If this is
> acceptable, I will follow up by tomorrow.


Sounds ok to me.
Thanks
Konstantin

Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements

Reply via email to