Richard Biener <richard.guent...@gmail.com> writes: > On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> GCC 6 and 7 would vectorise: >>>> >>>> void >>>> f (unsigned long incx, unsigned long incy, >>>> float *restrict dx, float *restrict dy) >>>> { >>>> unsigned long ix = 0, iy = 0; >>>> for (unsigned long i = 0; i < 512; ++i) >>>> { >>>> dy[iy] += dx[ix]; >>>> ix += incx; >>>> iy += incy; >>>> } >>>> } >>>> >>>> without first proving that incy is nonzero. This is a regression from >>>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>>> on whether incy is zero, but that's obviously too invasive to backport. >>>> This patch instead bails out for non-constant steps in the place that >>>> trunk would try a check for zeroness. >>>> >>>> I did wonder about trying to use range information to prove nonzeroness >>>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>>> suitable for a release branch. >>>> >>>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>>> to trunk too. >>> >>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or >>> DR_STEP (drb)? >>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra). >>> Even when not using range-info I think that using ! >>> tree_expr_nonzero_p (DR_STEP (dra)) >>> is more to the point of the issue we're fixing -- that also would catch >>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but >>> your >>> patch wouldn't so it looks like a more "complete" fix. >> >> OK. >> >>> Last but not least trunk and your patch guards all this by >>> !loop->force_vectorize. >>> But force_vectorize doesn't give any such guarantee that step isn't >>> zero so I wonder >>> why we deliberately choose to possibly miscompile stuff here? >> >> This was based on the pre-existing: >> >> if (loop_vinfo && integer_zerop (step)) >> { >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; >> if (!nested_in_vect_loop_p (loop, stmt)) >> return DR_IS_READ (dr); >> /* Allow references with zero step for outer loops marked >> with pragma omp simd only - it guarantees absence of >> loop-carried dependencies between inner loop iterations. */ >> if (!loop->force_vectorize) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> "zero step in inner loop of nest\n"); >> return false; >> } >> } >> >> AIUI #pragma omp simd really does guarantee that iterations of >> the loop can be executed concurrently (up to the limit given by >> safelen if present). So something like: >> >> #pragma omp simd >> for (int i = 0; i < n; ++i) >> a[i * step] += 1; >> >> would be incorrect for step==0. (#pragma ordered simd forces >> things to be executed in order where necessary.) > > But then we should check safelen, not force_vectorize...
OK. Following on from irc, this version uses expr_not_equal_to instead of tree_expr_nonzero_p. It also adds safelen to the existing use of force_vectorize (which seemed safer than replacing it). Tested on aarch64-linux-gnu. OK to install? Richard 2018-02-28 Richard Sandiford <richard.sandif...@linaro.org> gcc/ PR tree-optimization/84485 * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return true for zero dependence distances if the step might be zero, and if there is no metadata that guarantees correctness. (vect_analyze_data_ref_access): Check safelen as well as force_vectorize. gcc/testsuite/ PR tree-optimization/84485 * gcc.dg/vect/pr84485.c: New test. Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-02-28 14:19:22.326678337 +0000 +++ gcc/tree-vect-data-refs.c 2018-03-02 13:28:06.339321095 +0000 @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct } } + unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra))); + if (loop->safelen < 2 + && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "step could be zero.\n"); + return true; + } + continue; } @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat /* Allow references with zero step for outer loops marked with pragma omp simd only - it guarantees absence of loop-carried dependencies between inner loop iterations. */ - if (!loop->force_vectorize) + if (!loop->force_vectorize || loop->safelen < 2) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, Index: gcc/testsuite/gcc.dg/vect/pr84485.c =================================================================== --- /dev/null 2018-03-01 08:17:49.562264353 +0000 +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000 @@ -0,0 +1,34 @@ +/* { dg-do run } */ + +#include "tree-vect.h" + +#define N 256 + +void __attribute__ ((noinline, noclone)) +f (unsigned long incx, unsigned long incy, + float *restrict dx, float *restrict dy) +{ + unsigned long ix = 0, iy = 0; + for (unsigned long i = 0; i < N; ++i) + { + dy[iy] += dx[ix]; + ix += incx; + iy += incy; + } +} + +float a = 0.0; +float b[N]; + +int +main (void) +{ + check_vect (); + + for (int i = 0; i < N; ++i) + b[i] = i; + f (1, 0, b, &a); + if (a != N * (N - 1) / 2) + __builtin_abort (); + return 0; +}