[Bug target/101296] Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 05 Jul 2021 02:22:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101296


--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Samples: 884K of event 'cycles:u', Event count (approx.): 967510000841          
Overhead       Samples  Command          Shared Object             Symbol       
  13.76%        119196  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
u_shift_fermion                                                 #
  10.08%         87085  milc_base.amd64  milc_base.amd64-m64-mine  [.]
add_force_to_mom                                                #
   9.93%         85891  milc_base.amd64  milc_base.amd64-m64-mine  [.]
u_shift_fermion                                                 #
   9.38%         81331  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
add_force_to_mom                                                #
   9.03%         82570  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_na                                                     #
   8.55%         77803  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_na                                                     #
   7.41%         65641  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_nn                                                     #
   6.26%         55314  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_nn                                                     #
   1.48%         12876  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_an                                                     #
   1.42%         12625  milc_base.amd64  milc_base.amd64-m64-mine  [.]
imp_gauge_force.constprop.0                                     #
   1.18%         10602  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
imp_gauge_force.constprop.0                                     #
   1.00%          8853  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_mat_vec_sum_4dir                                       #
   0.94%          8343  milc_peak.amd64  milc_peak.amd64-m64-mine  [.]
mult_su3_mat_vec_sum_4dir                                       #
   0.94%          8156  milc_base.amd64  milc_base.amd64-m64-mine  [.]
mult_su3_an

The odd thing is that for example mult_su3_an reports vastly different
amount of cycles but the assembly is 1:1 identical.

There are in total 16 vaddsubpd instructions in the new variant in
symbols add_force_to_mom (1) and mult_su3_nn (15) but that doesn't
explain the difference seen above.

There are more detected ADDSUB patterns but they do not materialize in the
end, still there's some effect on RA and scheduling in functions like
u_shift_fermion, but the vectorizer dumps do not reveal anything interesting
for this example either.

I was using the following to disable the added pattern:

diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2671f91972d..388b185dc7b 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1510,7 +1510,7 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
-      || SLP_TREE_CHILDREN (node).length () != 2)
+      || SLP_TREE_CHILDREN (node).length () != 2 || 1)
     return NULL;

   /* Match a blend of a plus and a minus op with the same number of plus and


To sum up - I have no idea why performance has regressed.

[Bug target/101296] Addition of x86 addsub SLP patterned slowed down 433.milc by 12% on znver2 with -Ofast -flto

Reply via email to