On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford 
<richard.sandif...@arm.com> wrote:
>Richard Biener <rguent...@suse.de> writes:
>> On Tue, 30 Jun 2020, Richard Sandiford wrote:
>>
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
>>> > <richard.sandif...@arm.com> wrote:
>>> >>
>>> >> "Yangfei (Felix)" <felix.y...@huawei.com> writes:
>>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > index e050db1a2e4..ea39fcac0e0 100644
>>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>>> >> > @@ -1,6 +1,7 @@
>>> >> >  /* { dg-do compile } */
>>> >> >  /* { dg-additional-options "-O3" } */
>>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-*
>x86_64-*-* } } } */
>>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve
>-fno-vect-cost-model" { target aarch64*-*-* } } */
>>> >> >
>>> >> >  typedef struct {
>>> >> >      unsigned short mprr_2[5][16][16];
>>> >>
>>> >> This test is useful for Advanced SIMD too, so I think we should
>continue
>>> >> to test it with whatever options the person running the testsuite
>chose.
>>> >> Instead we could duplicate the test in gcc.target/aarch64/sve
>with
>>> >> appropriate options.
>>> >>
>>> >> > diff --git a/gcc/tree-vect-data-refs.c
>b/gcc/tree-vect-data-refs.c
>>> >> > index eb8288e7a85..b30a7d8a3bb 100644
>>> >> > --- a/gcc/tree-vect-data-refs.c
>>> >> > +++ b/gcc/tree-vect-data-refs.c
>>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment
>(loop_vec_info loop_vinfo)
>>> >> >               {
>>> >> >                 poly_uint64 nscalars = (STMT_SLP_TYPE
>(stmt_info)
>>> >> >                                         ? vf * DR_GROUP_SIZE
>(stmt_info) : vf);
>>> >> > -               possible_npeel_number
>>> >> > -                 = vect_get_num_vectors (nscalars, vectype);
>>> >> > +               if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS
>(vectype)))
>>> >> > +                 possible_npeel_number = 0;
>>> >> > +               else
>>> >> > +                 possible_npeel_number
>>> >> > +                   = vect_get_num_vectors (nscalars, vectype);
>>> >> >
>>> >> >                 /* NPEEL_TMP is 0 when there is no
>misalignment, but also
>>> >> >                    allow peeling NELEMENTS.  */
>>> >>
>>> >> OK, so this is coming from:
>>> >>
>>> >>   int s[16][2];
>>> >>   …
>>> >>   … =s[j][1];
>>> >>
>>> >> and an SLP node containing 16 instances of “s[j][1]”.  The
>DR_GROUP_SIZE
>>> >> is 2 because that's the inner dimension of “s”.
>>> >>
>>> >> I don't think maybe_lt is right here though.  The same problem
>could in
>>> >> principle happen for cases in which NSCALARS >
>TYPE_VECTOR_SUBPARTS,
>>> >> e.g. for different inner dimensions of “s”.
>>> >>
>>> >> I think the testcase shows that using DR_GROUP_SIZE in this
>calculation
>>> >> is flawed.  I'm not sure whether we can really do better given
>the current
>>> >> representation though.  This is one case where having a separate
>dr_vec_info
>>> >> per SLP node would help.
>>> >
>>> > I guess what the code likes to know is what we now have in
>SLP_TREE_LANES
>>> > (or formerly group_size) but that's not necessarily connected to
>DR_GROUP_SIZE.
>>> > Given we only see a stmt_info here and there's no 1:1 mapping of
>SLP node
>>> > to stmt_info (and the reverse mapping doesn't even exist) I do not
>have
>>> > any good idea either.
>>> >
>>> > Honestly I don't really see what this code tries to do ... doesn't
>it
>>> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS
>(vectype)?!
>>> 
>>> I think it's trying to set possible_npeel_number to the number of
>>> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
>>> 
>>>               /* For multiple types, it is possible that the bigger
>type access
>>>                  will have more than one peeling option.  E.g., a
>loop with two
>>>                  types: one of size (vector size / 4), and the other
>one of
>>>                  size (vector size / 8).  Vectorization factor will
>8.  If both
>>>                  accesses are misaligned by 3, the first one needs
>one scalar
>>>                  iteration to be aligned, and the second one needs
>5.  But the
>>>                  first one will be aligned also by peeling 5 scalar
>>>                  iterations, and in that case both accesses will be
>aligned.
>>>                  Hence, except for the immediate peeling amount, we
>also want
>>>                  to try to add full vector size, while we don't
>exceed
>>>                  vectorization factor.
>>>                  We do this automatically for cost model, since we
>calculate
>>>                  cost for every peeling option.  */
>>> 
>>> So for this example, possible_npeel_number==2 for the first access
>>> (letting us try two peeling values for it) and
>possible_npeel_number==1
>>> for the second.
>>
>> Yeah, but npeel is _scalar_ lanes, since we always peel scalar
>iterations.
>> So it seems odd to somehow put in the number of vectors...  so to me
>> it would have made sense if it did
>>
>>   possible_npeel_number = lower_bound (nscalars);
>>
>> or whateveris necessary to make the polys happy.  Thus simply elide
>> the vect_get_num_vectors call?  But it's been very long since I've
>> dived into the alignment peeling code...
>
>Ah, I see what you mean.  So rather than:
>
>             /* Save info about DR in the hash table.  Also include peeling
>                amounts according to the explanation above.  */
>              for (j = 0; j < possible_npeel_number; j++)
>                {
>                  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
>                                           dr_info, npeel_tmp);
>                 npeel_tmp += target_align / dr_size;
>                }
>
>just have something like:
>
>             while (known_le (npeel_tmp, nscalars))
>               {
>                 …
>               }
>
>?

Yeah. 

Richard. 

Reply via email to