[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #12 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:85621f98d245004a6c9787dde21e0acc17ab2c50 commit r14-9786-g85621f98d245004a6c9787dde21e0acc17ab2c50 Author: Richard Biener Date: Thu Apr 4 10:00:51 2024 +0200 tree-optimization/114485 - neg induction with partial vectors We can't use vect_update_ivs_after_vectorizer for partial vectors, the following fixes vect_can_peel_nonlinear_iv_p accordingly. PR tree-optimization/114485 * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): vect_step_op_neg isn't OK for partial vectors but only for unknown niter. * gcc.dg/vect/pr114485.c: New testcase.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #11 from Richard Biener --- Created attachment 57871 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57871=edit patch I'm testing this (on x86_64-linux).
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #10 from Richard Biener --- /* Init_expr will be update by vect_update_ivs_after_vectorizer, if niters or vf is unkown: For shift, when shift mount >= precision, there would be UD. For mult, don't known how to generate init_expr * pow (step, niters) for variable niters. For neg, it should be ok, since niters of vectorized main loop will always be multiple of 2. well, for partial vectors that's of course not true.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #9 from Richard Biener --- I think vect_update_ivs_after_vectorizer cannot deal at all with a masked loop.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener --- (In reply to Robin Dapp from comment #4) > Yes, the vectorization looks ok. The extracted live values are not used > afterwards and therefore the whole vectorized loop is being thrown away. > Then we do one iteration of the epilogue loop, inverting the original c and > end up with -8 instead of 8. This is pretty similar to what's happening in > the related PR. > > We properly populate the phi in question in > slpeel_update_phi_nodes_for_guard1: > > c_lsm.7_64 = PHI <_56(23), pretmp_34(17)> > > but vect_update_ivs_after_vectorizer changes that into > > c_lsm.7_64 = PHI . > > Just as a test, commenting out > > if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) > vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf, > update_e); > > at least makes us keep the VEC_EXTRACT and not fail anymore. I'll note that on x86_64 we do the same and not fail the testcase. x86 cannot use partial vectors because we don't implement EXTRACT_LAST, so that might be the "key" to the failure (partial vectors). And we might need to "fail" vectorization of the special inductions when using them? This might be also out-of-sync handling of which ones we handle with vect_update_ivs_after_vectorizer and which ones with vectorizable_live_operation - as indeed we do generate the EXTRACT_LAST here.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=112104 --- Comment #7 from Andrew Pinski --- (In reply to Andrew Pinski from comment #6) > Note the missed SCCP is filed as PR 114502 (and another bug for the > non-constant loop bounds case; I don't have the # right now). PR 112104 for the non-constant loop bounds case.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #6 from Andrew Pinski --- Note the missed SCCP is filed as PR 114502 (and another bug for the non-constant loop bounds case; I don't have the # right now).
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #5 from Andrew Pinski --- *** Bug 114476 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #4 from Robin Dapp --- Yes, the vectorization looks ok. The extracted live values are not used afterwards and therefore the whole vectorized loop is being thrown away. Then we do one iteration of the epilogue loop, inverting the original c and end up with -8 instead of 8. This is pretty similar to what's happening in the related PR. We properly populate the phi in question in slpeel_update_phi_nodes_for_guard1: c_lsm.7_64 = PHI <_56(23), pretmp_34(17)> but vect_update_ivs_after_vectorizer changes that into c_lsm.7_64 = PHI . Just as a test, commenting out if (!LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf, update_e); at least makes us keep the VEC_EXTRACT and not fail anymore.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #3 from Richard Biener --- Huh. _75 = [vec_duplicate_expr] pretmp_34; _76 = -_75; _77 = VEC_PERM_EXPR <_75, _76, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>; # c_lsm.7_8 = PHI <_2(9), pretmp_34(19)> vect__2.17_79 = -_77; _2 = -c_lsm.7_8; [local count: 94607391]: # i_101 = PHI # vect__2.17_102 = PHI # loop_mask_103 = PHI # vect_iftmp.24_104 = PHI _68 = ni_gap.12_67; _93 = .EXTRACT_LAST (loop_mask_103, vect_iftmp.24_104); iftmp.1_59 = _93; _82 = .EXTRACT_LAST (loop_mask_103, vect__2.17_102); it looks OK to me? But maybe the poly-int-cst permute is wrong? Should be an interleave.
[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 Andrew Pinski changed: What|Removed |Added Known to work||12.3.0 Target Milestone|--- |13.3 Last reconfirmed||2024-03-26 Blocks||53947 Status|UNCONFIRMED |NEW Known to fail||13.1.0 Summary|[14] Wrong code with -O3|[13/14 Regression] Wrong |-march=rv64gcv on riscv |code with -O3 ||-march=rv64gcv on riscv or ||`-O3 -march=armv9-a` for ||aarch64 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Confirmed. Yes it does look very similar if not the same. This one does not even need -fno-vect-cost-model nor -fwrapv for aarch64 even. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations