Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
Last reconfirmed| |2018-04-10
Target Milestone|--- |8.0
Summary|436.cactusADM regressed by |[8 regression]
|6-8% percent with -Ofast on |436.cactusADM regressed by
|Zen, compared to gcc 7.2 |6-8% percent with -Ofast on
| |Zen and Haswell, compared
| |to gcc 7.2
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I also see this for Haswell:
There it's more like 10-14% depending on which parts you look at.
For bisection it's a bit weird:
201710240032 r254030 base 48.3 peak 52.2
201710230039 r253996 base 64.7 peak 57.2
201710221240 r253982 base 64.6 peak 65.8
201710210035 r253966 base 65.6 peak 65.2
where base is -Ofast -march=haswell and peak adds -flto.
Note it might be that around this time I disabled address-space randomization
just in case it is an issue similar to PR82362. I just don't remember exactly
so I'd have to reproduce the regression around this revs.
between r253982 and r253996 the culprit likely would be
r253993 | hubicka | 2017-10-23 00:09:47 +0200 (Mon, 23 Oct 2017) | 12 lines
* i386.c (ix86_builtin_vectorization_cost): Use existing rtx_cost
latencies instead of having separate table; make difference between
integer and float costs.
* i386.h (processor_costs): Remove scalar_stmt_cost,
scalar_load_cost, scalar_store_cost, vec_stmt_cost, vec_to_scalar_cost,
scalar_to_vec_cost, vec_align_load_cost, vec_unalign_load_cost,
* x86-tune-costs.h: Remove entries which has been removed in
procesor_costs from all tables; make cond_taken_branch_cost
and cond_not_taken_branch_cost COST_N_INSNS based.
similar the other range includes
r254012 | hubicka | 2017-10-23 17:10:09 +0200 (Mon, 23 Oct 2017) | 15 lines
* i386.c (dimode_scalar_chain::compute_convert_gain): Use
xmm_move instead of sse_move.
(sse_store_index): New function.
(ix86_register_move_cost): Be more sensible about mismatch stall;
model AVX moves correctly; make difference between sse->integer and
(ix86_builtin_vectorization_cost): Model correctly aligned and
moves; make difference between SSE and AVX.
* i386.h (processor_costs): Remove sse_move; add xmm_move, ymm_move
and zmm_move. Increase size of sse load and store tables;
add unaligned load and store tables; add ssemmx_to_integer.
* x86-tune-costs.h: Update all entries according to real
move latencies from Agner Fog's manual and chip documentation.
so it indeed looks like a target (vectorization) cost model issue at a first
Profiling the difference between non-LTO r253982 and r254030 might tell
apart the important loop(s). Note that we did recover performance later.
cactusADM is a bit noisy (see that other PR) but base is now in the range
of 51-55 with peak a little bit higher than that.