https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531

            Bug ID: 114531
           Summary: Feature proposal for an
                    `-finline-functions-aggressive` compiler option
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: driver
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rvmallad at amazon dot com
                CC: rsandifo at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57837
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57837&action=edit
patch to implement -finline-functions-aggressive option in GCC

This is a proposal for a user-visible GCC compiler option for aggressive
inlining that is currently only available at -O3 as internal inline parameters
(--param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600
--param=inline-min-speedup=15 --param=max-inline-insns-auto=30
--param=max-inline-insns-single=200).

I got some perf data for Envoy (https://github.com/envoyproxy/envoy) and SPEC
CPU2017 intrate benchmarks on C7g.2xlarge w Ubuntu22 + gcc-11.4.0. We see perf
gains (2% - 5%) using these aggressive inline parameters (at -O2). Attached is
a patch for this change.

We do not want to add these inline limits at ‘-O2’ itself, as we see from one
of the SPEC CPU tests that got slower. Also, more inline tuning at -O2 would
make some of the symbols not to be available for probe/ debug (that are
available when not using these aggressive inline params).

-----------------------------------------------------------------------
Envoy load_balancer_benchmark – using only 1 CPU – Iterations, higher better
$ bazel run -c opt //test/common/upstream:load_balancer_benchmark

bazel-envoy/external/local_config_cc/BUILD can be changed for adding inline
parameters/ options.

------------------------------------------------------------------------
Benchmark Iterations           Baseline O2        + Inline Params   Gain
------------------------------------------------------------------------
benchmarkRoundRobinLoad          1518               1596           1.05x
BalancerBuild/500/50/50

benchmarkLeastRequestLoad        1465               1514           1.03x
BalancerChooseHost/100/3/1000           

benchmarkRingHashLoadBalancer      33                 34           1.03x
ChooseHost/100/65536/100000           

benchmarkMaglevLoadBalancer        69                 72           1.04x
Weighted/500/95/75/25/10000
------------------------------------------------------------------------

copies=8        "-O2"   "-Ofast" Gain          "-O2 +           Gain w
                                 w Ofast        inlining"       inlining
500.perlbench_r 36.5    34.3     94.0%          34.4            94.2%
502.gcc_r       45.4    47.6     104.8%         47.5            104.6%
505.mcf_r       44.6    48.2     108.1%         44.3            99.3%
520.omnetpp_r   22.1    24.9     112.7%         21.9            99.1%
523.xalancbmk_r 43.8    46.3     105.7%         45.4            103.7%
525.x264_r      44.3    89       200.9%         43.8            98.9%
531.deepsjeng_r 36      37.3     103.6%         37.5            104.2%
541.leela_r     33.5    33.9     101.2%         34.2            102.1%
548.exchange2_r 65.4    76.6     117.1%         65.3            99.8%
557.xz_r        19.8    19.9     100.5%         19.9            100.5%
SPECrate..base  37.1    41.6     112.1%         37.3            100.5%
-----------------------------------------------------------------------

Reply via email to