The testcase attached in the PR shows that for some reason
the test openmp_vv.sum when doing OpenMP offloading creates
an intermediate empty block after the skip_epilog split.
This means we should just simply delay setting update_e for
the non-early break case. For early break we have to do it
early still otherwise the skip_epilog edge would make us find
the wrong edge.
I haven't been able to replicate this on a C testcase and the
attached reduction works fine on AArch64 and x86_64 but have
been able to verify the fixed code with
./gcc/f951 -fopenmp test2.f90 -O3 -o - -march=sm_30 on an
--target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu
configured cc1.
If I manage to create a testcase will push it too.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Pushed to master.
Thanks,
Tamar
gcc/ChangeLog:
PR middle-end/122959
* tree-vect-loop-manip.cc (vect_do_peeling): Delay setting update_e.
---
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index
43847c4c3fbdbe7b8364d30e0b614b39cbabf367..18af6a1811ac78ff59201243ac4e1ab72b5aa3d4
100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3580,7 +3580,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
tree nitersm1,
/* Update IVs of original loop as if they were advanced by
niters_vector_mult_vf steps. */
gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
- update_e = skip_vector ? e : loop_preheader_edge (epilog);
if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
update_e = single_succ_edge (LOOP_VINFO_IV_EXIT (loop_vinfo)->dest);
@@ -3639,6 +3638,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
tree nitersm1,
scale_loop_profile (epilog, prob_epilog, -1);
}
+ /* Identify the right foward edge for the non-early-break case which must
+ be done after splitting the epilog edge. */
+ if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+ update_e = skip_vector ? e : loop_preheader_edge (epilog);
+
/* If we have a peeled vector iteration, all exits are the same, leave it
and so the main exit needs to be treated the same as the alternative
exits in that we leave their updates to vectorizable_live_operations.
--
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 43847c4c3fbdbe7b8364d30e0b614b39cbabf367..18af6a1811ac78ff59201243ac4e1ab72b5aa3d4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3580,7 +3580,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
/* Update IVs of original loop as if they were advanced by
niters_vector_mult_vf steps. */
gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
- update_e = skip_vector ? e : loop_preheader_edge (epilog);
if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
update_e = single_succ_edge (LOOP_VINFO_IV_EXIT (loop_vinfo)->dest);
@@ -3639,6 +3638,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
scale_loop_profile (epilog, prob_epilog, -1);
}
+ /* Identify the right foward edge for the non-early-break case which must
+ be done after splitting the epilog edge. */
+ if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+ update_e = skip_vector ? e : loop_preheader_edge (epilog);
+
/* If we have a peeled vector iteration, all exits are the same, leave it
and so the main exit needs to be treated the same as the alternative
exits in that we leave their updates to vectorizable_live_operations.