On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> GCC 6 and 7 would vectorise: >>> >>> void >>> f (unsigned long incx, unsigned long incy, >>> float *restrict dx, float *restrict dy) >>> { >>> unsigned long ix = 0, iy = 0; >>> for (unsigned long i = 0; i < 512; ++i) >>> { >>> dy[iy] += dx[ix]; >>> ix += incx; >>> iy += incy; >>> } >>> } >>> >>> without first proving that incy is nonzero. This is a regression from >>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>> on whether incy is zero, but that's obviously too invasive to backport. >>> This patch instead bails out for non-constant steps in the place that >>> trunk would try a check for zeroness. >>> >>> I did wonder about trying to use range information to prove nonzeroness >>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>> suitable for a release branch. >>> >>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>> to trunk too. >> >> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP >> (drb)? >> That seems what trunk is doing (just look at dr_zero_step_indicator of dra). >> Even when not using range-info I think that using ! >> tree_expr_nonzero_p (DR_STEP (dra)) >> is more to the point of the issue we're fixing -- that also would catch >> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your >> patch wouldn't so it looks like a more "complete" fix. > > OK. > >> Last but not least trunk and your patch guards all this by >> !loop->force_vectorize. >> But force_vectorize doesn't give any such guarantee that step isn't >> zero so I wonder >> why we deliberately choose to possibly miscompile stuff here? > > This was based on the pre-existing: > > if (loop_vinfo && integer_zerop (step)) > { > GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; > if (!nested_in_vect_loop_p (loop, stmt)) > return DR_IS_READ (dr); > /* Allow references with zero step for outer loops marked > with pragma omp simd only - it guarantees absence of > loop-carried dependencies between inner loop iterations. */ > if (!loop->force_vectorize) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "zero step in inner loop of nest\n"); > return false; > } > } > > AIUI #pragma omp simd really does guarantee that iterations of > the loop can be executed concurrently (up to the limit given by > safelen if present). So something like: > > #pragma omp simd > for (int i = 0; i < n; ++i) > a[i * step] += 1; > > would be incorrect for step==0. (#pragma ordered simd forces > things to be executed in order where necessary.)
But then we should check safelen, not force_vectorize... Richard. > Thanks, > Richard > >> Thus I'd like to see a simpler >> >> + if (! tree_expr_nonzero_p (DR_STEP (dra))) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "step could be zero.\n"); >> + return true; >> + } >> >> if you think that works out and also the force_vectorize guard removed from >> the >> trunk version. >> >> OK with that change. >> >> Thanks, >> Richard. >> >>> Richard >>> >>> >>> 2018-02-28 Richard Sandiford <richard.sandif...@linaro.org> >>> >>> gcc/ >>> PR tree-optimization/84485 >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return >>> true for zero dependence distances if either step is variable, and >>> if >>> there is no metadata that guarantees correctness. >>> >>> gcc/testsuite/ >>> PR tree-optimization/84485 >>> * gcc.dg/vect/pr84485.c: New test. >>> >>> Index: gcc/tree-vect-data-refs.c >>> =================================================================== >>> --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 >>> +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 >>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct >>> } >>> } >>> >>> + if (!loop->force_vectorize >>> + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST >>> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) >>> + { >>> + if (dump_enabled_p ()) >>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> + "step could be zero.\n"); >>> + return true; >>> + } >>> + >>> continue; >>> } >>> >>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c >>> =================================================================== >>> --- /dev/null 2018-02-26 10:26:41.624847060 +0000 >>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 >>> @@ -0,0 +1,34 @@ >>> +/* { dg-do run } */ >>> + >>> +#include "tree-vect.h" >>> + >>> +#define N 256 >>> + >>> +void __attribute__ ((noinline, noclone)) >>> +f (unsigned long incx, unsigned long incy, >>> + float *restrict dx, float *restrict dy) >>> +{ >>> + unsigned long ix = 0, iy = 0; >>> + for (unsigned long i = 0; i < N; ++i) >>> + { >>> + dy[iy] += dx[ix]; >>> + ix += incx; >>> + iy += incy; >>> + } >>> +} >>> + >>> +float a = 0.0; >>> +float b[N]; >>> + >>> +int >>> +main (void) >>> +{ >>> + check_vect (); >>> + >>> + for (int i = 0; i < N; ++i) >>> + b[i] = i; >>> + f (1, 0, b, &a); >>> + if (a != N * (N - 1) / 2) >>> + __builtin_abort (); >>> + return 0; >>> +}