On Fri, 6 May 2022 16:28:41 +0100
Bruce Richardson <bruce.richard...@intel.com> wrote:

> On Fri, May 06, 2022 at 03:12:32PM +0000, Honnappa Nagarahalli wrote:
> > <snip>  
> > > 
> > > On Thu, May 05, 2022 at 10:59:32PM +0000, Honnappa Nagarahalli wrote:  
> > > > Thanks Stephen. Do you see any performance difference with this change? 
> > > >  
> > > 
> > > as a matter of due diligence i think a comparison should be made just to 
> > > be
> > > confident nothing is regressing.
> > > 
> > > i support this change in principal since it is generally accepted best 
> > > practice to
> > > not force inlining since it can remove more valuable optimizations that 
> > > the
> > > compiler may make that the human can't see.
> > > the optimizations may vary depending on compiler implementation.
> > > 
> > > force inlining should be used as a targeted measure rather than blanket on
> > > every function and when in use probably needs to be periodically reviewed 
> > > and
> > > potentially removed as the code / compiler evolves.
> > > 
> > > also one other consideration is the impact of a particular compiler's 
> > > force
> > > inlining intrinsic/builtin is that it may permit inlining of functions 
> > > when not
> > > declared in a header. i.e. a function from one library may be able to be 
> > > inlined
> > > to another binary as a link time optimization. although everything here 
> > > is in a
> > > header so it's a bit moot.
> > > 
> > > i'd like to see this change go in if possible.  
> > Like Stephen mentions below, I am sure we will have a for and against 
> > discussion here.
> > As a DPDK community we have put performance front and center, I would 
> > prefer to go down that route first.
> >  
> 
> I ran some initial numbers with this patch, and the very quick summary of
> what I've seen so far:
> 
> * Unit tests show no major differences, and while it depends on what
>   specific number you are interested in, most seem within margin of error.
> * Within unit tests, the one number I mostly look at when considering
>   inlining is the "empty poll" cost, since I believe we should look to keep
>   that as close to zero as possible. In the past I've seen that number jump
>   from 3 cycles to 12 cycles due to missed inlining. In this case, it seem
>   fine.
> * Ran a quick test with the eventdev_pipeline example app using SW eventdev,
>   as a test of an actual app which is fairly ring-heavy [used 8 workers
>   with 1000 cycles per packet hop]. (Thanks to Harry vH for this suggestion
>   of a workload)
>   * GCC 8 build - no difference observed
>   * GCC 11 build - approx 2% perf reduction observed
> 
> As I said, these are just some quick rough numbers, and I'll try and get
> some more numbers on a couple of different platforms, see if the small
> reduction seen is consistent or not. I may also test a few differnet
> combinations/options in the eventdev test.  It would be good if others also
> tested on a few platforms available to them.
> 
> /Bruce

I wonder if a mixed approach might help where some key bits were marked
as more important to inline? Or setting compiler flags in build infra?

Reply via email to