On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote:
> Hello,
>
> PR tree-optimization/53636 is a crash due to an invalid unaligned access
> generated by the vectorizer.
>
> The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO
> as computed by the default data-ref analysis to decide whether an access
> is sufficiently aligned for the vectorizer.
>
> However, this analysis computes the scalar evolution relative to the
> innermost loop in which the access takes place; DR_ALIGNED_TO only
> reflects the known alignmnent of the *base* address according to
> that evolution.
>
> Now, if we're actually about to vectorize this particular loop, then
> just checking the alignment of the base is exactly what we need to do
> (subsequent accesses will usually be misaligned, but that's OK since
> we're transforming those into a vector access).
>
> However, if we're actually currently vectorizing something else, this
> test is not sufficient.  The code currently already checks for the
> case where we're performing outer-loop vectorization.  In this case,
> we need to check alignment of the access on *every* pass through the
> inner loop, as the comment states:
>
>  /* In case the dataref is in an inner-loop of the loop that is being
>     vectorized (LOOP), we use the base and misalignment information
>     relative to the outer-loop (LOOP).  This is ok only if the misalignment
>     stays the same throughout the execution of the inner-loop, which is why
>     we have to check that the stride of the dataref in the inner-loop evenly
>     divides by the vector size.  */
>
> However, there is a second case where we need to check every pass: if
> we're not actually vectorizing any loop, but are performing basic-block
> SLP.  In this case, it would appear that we need the same check as
> described in the comment above, i.e. to verify that the stride is a
> multiple of the vector size.
>
> The patch below adds this check, and this indeed fixes the invalid access
> I was seeing in the test case (in the final assembler, we now get a vld1.16
> instead of vldr).
>
> Tested on arm-linux-gnueabi with no regressions.
>
> OK for mainline?

Ok.

Thanks,
Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>        gcc/
>        PR tree-optimization/53636
>        * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
>        stride when doing basic-block vectorization.
>
>        gcc/testsuite/
>        PR tree-optimization/53636
>        * gcc.target/arm/pr53636.c: New test.
>
>
> === added file 'gcc/testsuite/gcc.target/arm/pr53636.c'
> --- gcc/testsuite/gcc.target/arm/pr53636.c      1970-01-01 00:00:00 +0000
> +++ gcc/testsuite/gcc.target/arm/pr53636.c      2012-06-11 17:31:41 +0000
> @@ -0,0 +1,48 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O -ftree-vectorize" } */
> +/* { dg-add-options arm_neon } */
> +
> +void fill (short *buf) __attribute__ ((noinline));
> +void fill (short *buf)
> +{
> +  int i;
> +
> +  for (i = 0; i < 11 * 8; i++)
> +    buf[i] = i;
> +}
> +
> +void test (unsigned char *dst) __attribute__ ((noinline));
> +void test (unsigned char *dst)
> +{
> +  short tmp[11 * 8], *tptr;
> +  int i;
> +
> +  fill (tmp);
> +
> +  tptr = tmp;
> +  for (i = 0; i < 8; i++)
> +    {
> +      dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) 
> >> 7;
> +      dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) 
> >> 7;
> +      dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) 
> >> 7;
> +      dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) 
> >> 7;
> +      dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) 
> >> 7;
> +      dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) 
> >> 7;
> +      dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) 
> >> 7;
> +      dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) 
> >> 7;
> +
> +      dst += 8;
> +      tptr += 11;
> +    }
> +}
> +
> +int main (void)
> +{
> +  char buf [8 * 8];
> +
> +  test (buf);
> +
> +  return 0;
> +}
> +
>
> === modified file 'gcc/tree-vect-data-refs.c'
> --- gcc/tree-vect-data-refs.c   2012-05-31 08:46:10 +0000
> +++ gcc/tree-vect-data-refs.c   2012-06-11 17:31:41 +0000
> @@ -845,6 +845,24 @@
>        }
>     }
>
> +  /* Similarly, if we're doing basic-block vectorization, we can only use
> +     base and misalignment information relative to an innermost loop if the
> +     misalignment stays the same throughout the execution of the loop.
> +     As above, this is the case if the stride of the dataref evenly divides
> +     by the vector size.  */
> +  if (!loop)
> +    {
> +      tree step = DR_STEP (dr);
> +      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
> +
> +      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
> +       {
> +         if (vect_print_dump_info (REPORT_ALIGNMENT))
> +           fprintf (vect_dump, "SLP: step doesn't divide the vector-size.");
> +         misalign = NULL_TREE;
> +       }
> +    }
> +
>   base = build_fold_indirect_ref (base_addr);
>   alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
>
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  ulrich.weig...@de.ibm.com
>

Reply via email to