On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <[email protected]> wrote:
> Hello,
>
> PR tree-optimization/53636 is a crash due to an invalid unaligned access
> generated by the vectorizer.
>
> The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO
> as computed by the default data-ref analysis to decide whether an access
> is sufficiently aligned for the vectorizer.
>
> However, this analysis computes the scalar evolution relative to the
> innermost loop in which the access takes place; DR_ALIGNED_TO only
> reflects the known alignmnent of the *base* address according to
> that evolution.
>
> Now, if we're actually about to vectorize this particular loop, then
> just checking the alignment of the base is exactly what we need to do
> (subsequent accesses will usually be misaligned, but that's OK since
> we're transforming those into a vector access).
>
> However, if we're actually currently vectorizing something else, this
> test is not sufficient. The code currently already checks for the
> case where we're performing outer-loop vectorization. In this case,
> we need to check alignment of the access on *every* pass through the
> inner loop, as the comment states:
>
> /* In case the dataref is in an inner-loop of the loop that is being
> vectorized (LOOP), we use the base and misalignment information
> relative to the outer-loop (LOOP). This is ok only if the misalignment
> stays the same throughout the execution of the inner-loop, which is why
> we have to check that the stride of the dataref in the inner-loop evenly
> divides by the vector size. */
>
> However, there is a second case where we need to check every pass: if
> we're not actually vectorizing any loop, but are performing basic-block
> SLP. In this case, it would appear that we need the same check as
> described in the comment above, i.e. to verify that the stride is a
> multiple of the vector size.
>
> The patch below adds this check, and this indeed fixes the invalid access
> I was seeing in the test case (in the final assembler, we now get a vld1.16
> instead of vldr).
>
> Tested on arm-linux-gnueabi with no regressions.
>
> OK for mainline?
Ok.
Thanks,
Richard.
> Bye,
> Ulrich
>
>
> ChangeLog:
>
> gcc/
> PR tree-optimization/53636
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify
> stride when doing basic-block vectorization.
>
> gcc/testsuite/
> PR tree-optimization/53636
> * gcc.target/arm/pr53636.c: New test.
>
>
> === added file 'gcc/testsuite/gcc.target/arm/pr53636.c'
> --- gcc/testsuite/gcc.target/arm/pr53636.c 1970-01-01 00:00:00 +0000
> +++ gcc/testsuite/gcc.target/arm/pr53636.c 2012-06-11 17:31:41 +0000
> @@ -0,0 +1,48 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O -ftree-vectorize" } */
> +/* { dg-add-options arm_neon } */
> +
> +void fill (short *buf) __attribute__ ((noinline));
> +void fill (short *buf)
> +{
> + int i;
> +
> + for (i = 0; i < 11 * 8; i++)
> + buf[i] = i;
> +}
> +
> +void test (unsigned char *dst) __attribute__ ((noinline));
> +void test (unsigned char *dst)
> +{
> + short tmp[11 * 8], *tptr;
> + int i;
> +
> + fill (tmp);
> +
> + tptr = tmp;
> + for (i = 0; i < 8; i++)
> + {
> + dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3])
> >> 7;
> + dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3])
> >> 7;
> + dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3])
> >> 7;
> + dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3])
> >> 7;
> + dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3])
> >> 7;
> + dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3])
> >> 7;
> + dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3])
> >> 7;
> + dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3])
> >> 7;
> +
> + dst += 8;
> + tptr += 11;
> + }
> +}
> +
> +int main (void)
> +{
> + char buf [8 * 8];
> +
> + test (buf);
> +
> + return 0;
> +}
> +
>
> === modified file 'gcc/tree-vect-data-refs.c'
> --- gcc/tree-vect-data-refs.c 2012-05-31 08:46:10 +0000
> +++ gcc/tree-vect-data-refs.c 2012-06-11 17:31:41 +0000
> @@ -845,6 +845,24 @@
> }
> }
>
> + /* Similarly, if we're doing basic-block vectorization, we can only use
> + base and misalignment information relative to an innermost loop if the
> + misalignment stays the same throughout the execution of the loop.
> + As above, this is the case if the stride of the dataref evenly divides
> + by the vector size. */
> + if (!loop)
> + {
> + tree step = DR_STEP (dr);
> + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
> +
> + if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
> + {
> + if (vect_print_dump_info (REPORT_ALIGNMENT))
> + fprintf (vect_dump, "SLP: step doesn't divide the vector-size.");
> + misalign = NULL_TREE;
> + }
> + }
> +
> base = build_fold_indirect_ref (base_addr);
> alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
>
>
> --
> Dr. Ulrich Weigand
> GNU Toolchain for Linux on System z and Cell BE
> [email protected]
>