[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-04-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:85621f98d245004a6c9787dde21e0acc17ab2c50

commit r14-9786-g85621f98d245004a6c9787dde21e0acc17ab2c50
Author: Richard Biener 
Date:   Thu Apr 4 10:00:51 2024 +0200

tree-optimization/114485 - neg induction with partial vectors

We can't use vect_update_ivs_after_vectorizer for partial vectors,
the following fixes vect_can_peel_nonlinear_iv_p accordingly.

PR tree-optimization/114485
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p):
vect_step_op_neg isn't OK for partial vectors but only
for unknown niter.

* gcc.dg/vect/pr114485.c: New testcase.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-04-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #11 from Richard Biener  ---
Created attachment 57871
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57871=edit
patch

I'm testing this (on x86_64-linux).

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-04-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #10 from Richard Biener  ---
  /* Init_expr will be update by vect_update_ivs_after_vectorizer,
 if niters or vf is unkown:
 For shift, when shift mount >= precision, there would be UD.
 For mult, don't known how to generate
 init_expr * pow (step, niters) for variable niters.
 For neg, it should be ok, since niters of vectorized main loop
 will always be multiple of 2.

well, for partial vectors that's of course not true.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-04-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Richard Biener  ---
I think vect_update_ivs_after_vectorizer cannot deal at all with a masked loop.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-04-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #8 from Richard Biener  ---
(In reply to Robin Dapp from comment #4)
> Yes, the vectorization looks ok.  The extracted live values are not used
> afterwards and therefore the whole vectorized loop is being thrown away.
> Then we do one iteration of the epilogue loop, inverting the original c and
> end up with -8 instead of 8.  This is pretty similar to what's happening in
> the related PR.
> 
> We properly populate the phi in question in
> slpeel_update_phi_nodes_for_guard1:
> 
> c_lsm.7_64 = PHI <_56(23), pretmp_34(17)>
> 
> but vect_update_ivs_after_vectorizer changes that into
> 
> c_lsm.7_64 = PHI .
> 
> Just as a test, commenting out
> 
>   if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
>   vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> update_e);
> 
> at least makes us keep the VEC_EXTRACT and not fail anymore.

I'll note that on x86_64 we do the same and not fail the testcase.  x86
cannot use partial vectors because we don't implement EXTRACT_LAST,
so that might be the "key" to the failure (partial vectors). And we
might need to "fail" vectorization of the special inductions when
using them?

This might be also out-of-sync handling of which ones we handle with
vect_update_ivs_after_vectorizer and which ones with
vectorizable_live_operation - as indeed we do generate the EXTRACT_LAST here.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=112104

--- Comment #7 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #6)
> Note the missed SCCP is filed as PR 114502 (and another bug for the
> non-constant loop bounds case; I don't have the # right now).

PR 112104 for the non-constant loop bounds case.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #6 from Andrew Pinski  ---
Note the missed SCCP is filed as PR 114502 (and another bug for the
non-constant loop bounds case; I don't have the # right now).

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #5 from Andrew Pinski  ---
*** Bug 114476 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #4 from Robin Dapp  ---
Yes, the vectorization looks ok.  The extracted live values are not used
afterwards and therefore the whole vectorized loop is being thrown away.
Then we do one iteration of the epilogue loop, inverting the original c and end
up with -8 instead of 8.  This is pretty similar to what's happening in the
related PR.

We properly populate the phi in question in slpeel_update_phi_nodes_for_guard1:

c_lsm.7_64 = PHI <_56(23), pretmp_34(17)>

but vect_update_ivs_after_vectorizer changes that into

c_lsm.7_64 = PHI .

Just as a test, commenting out

  if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
  update_e);

at least makes us keep the VEC_EXTRACT and not fail anymore.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

--- Comment #3 from Richard Biener  ---
Huh.

  _75 = [vec_duplicate_expr] pretmp_34;
  _76 = -_75;
  _77 = VEC_PERM_EXPR <_75, _76, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST [5,
4], 2, POLY_INT_CST [6, 4], ... }>;

  # c_lsm.7_8 = PHI <_2(9), pretmp_34(19)>
  vect__2.17_79 = -_77;
  _2 = -c_lsm.7_8;

   [local count: 94607391]:
  # i_101 = PHI 
  # vect__2.17_102 = PHI 
  # loop_mask_103 = PHI 
  # vect_iftmp.24_104 = PHI 
  _68 = ni_gap.12_67;
  _93 = .EXTRACT_LAST (loop_mask_103, vect_iftmp.24_104);
  iftmp.1_59 = _93;
  _82 = .EXTRACT_LAST (loop_mask_103, vect__2.17_102);

it looks OK to me?  But maybe the poly-int-cst permute is wrong?  Should
be an interleave.

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||12.3.0
   Target Milestone|--- |13.3
   Last reconfirmed||2024-03-26
 Blocks||53947
 Status|UNCONFIRMED |NEW
  Known to fail||13.1.0
Summary|[14] Wrong code with -O3|[13/14 Regression] Wrong
   |-march=rv64gcv on riscv |code with -O3
   ||-march=rv64gcv on riscv or
   ||`-O3 -march=armv9-a` for
   ||aarch64
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Confirmed. Yes it does look very similar if not the same.
This one does not even need -fno-vect-cost-model nor -fwrapv for aarch64 even.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations