https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108629

            Bug ID: 108629
           Summary: 549.fotonik3d_r regresses 15-24% at -O2 -flto
                    -march=x86-64-v3 since r13-1203-g038b077689bb53
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: rsandifo at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

When benchmarking trunk revision 99ea0d76116 I noticed a 24%
regression on Zen4 and Zen3 machines and 16% on a Zen2 and a Intel
CascadeLake when running 549.fotonik3d_r from SPEC 2017 FPrate suite
built with options -O2 -g -march=x86-64-v3 -flto=32 compared to the
binary produced by GCC 12.

The number of branches reported by perf stat between gcc 12 and the
aforementioned trunk revision on the Zen3 machine jumped by 90%.

The symbol profile changed from:

  Overhead  Samples  Shared object           Name
  33.23%    40078    fotonik3d_r_peak.gcc12 
__upml_mod_MOD_upml_updatee_simple.lto_priv.0
  27.74%    33471    fotonik3d_r_peak.gcc12  __upml_mod_MOD_upml_updateh
  17.50%    21114    fotonik3d_r_peak.gcc12  __material_mod_MOD_mat_updatee
  9.52%     11493    fotonik3d_r_peak.gcc12  __update_mod_MOD_updateh
  9.49%     11445    fotonik3d_r_peak.gcc12  __power_mod_MOD_power_dft

To:

  Overhead  Samples  Shared object           Name
  26.68%    39825    fotonik3d_r_peak.trunk 
__upml_mod_MOD_upml_updatee_simple.lto_priv.0
  22.35%    33368    fotonik3d_r_peak.trunk  __upml_mod_MOD_upml_updateh
  13.99%    20892    fotonik3d_r_peak.trunk  __material_mod_MOD_mat_updatee
  13.96%    20816    fotonik3d_r_peak.trunk  __power_mod_MOD_power_dft
  11.51%    17164    libgcc_s.so.1           __muldc3
  8.60%     12840    fotonik3d_r_peak.trunk  __update_mod_MOD_updateh


On the Zen3 machine at least, I have bisected this to:

  commit 038b077689bb5310386b04d40a2cea234f01e6aa
  Author: Richard Sandiford <richard.sandif...@arm.com>
  Date:   Wed Jun 22 11:27:15 2022 +0100

    data-ref: Improve non-loop disambiguation [PR106019]

    When dr_may_alias_p is called without a loop context, it tries
    to use the tree-affine interface to calculate the difference
    between the two addresses and use that difference to check whether
    the gap between the accesses is known at compile time.  However, as the
    example in the PR shows, this doesn't expand SSA_NAMEs and so can easily
    be defeated by things like reassociation.

    One fix would have been to use aff_combination_expand to expand the
    SSA_NAMEs, but we'd then need some way of maintaining the associated
    cache.  This patch instead reuses the innermost_loop_behavior fields
    (which exist even when no loop context is provided).

    It might still be useful to do the aff_combination_expand thing too,
    if an example turns out to need it.

    gcc/
            PR tree-optimization/106019
            * tree-data-ref.cc (dr_may_alias_p): Try using the
            innermost_loop_behavior to disambiguate non-loop queries.

    gcc/testsuite/
            PR tree-optimization/106019
            * gcc.dg/vect/bb-slp-pr106019.c: New test.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to