On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >Richard Biener <rguent...@suse.de> writes: >> On Tue, 30 Jun 2020, Richard Sandiford wrote: >> >>> Richard Biener <richard.guent...@gmail.com> writes: >>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford >>> > <richard.sandif...@arm.com> wrote: >>> >> >>> >> "Yangfei (Felix)" <felix.y...@huawei.com> writes: >>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >>> >> > index e050db1a2e4..ea39fcac0e0 100644 >>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >>> >> > @@ -1,6 +1,7 @@ >>> >> > /* { dg-do compile } */ >>> >> > /* { dg-additional-options "-O3" } */ >>> >> > /* { dg-additional-options "-mavx2" { target { i?86-*-* >x86_64-*-* } } } */ >>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve >-fno-vect-cost-model" { target aarch64*-*-* } } */ >>> >> > >>> >> > typedef struct { >>> >> > unsigned short mprr_2[5][16][16]; >>> >> >>> >> This test is useful for Advanced SIMD too, so I think we should >continue >>> >> to test it with whatever options the person running the testsuite >chose. >>> >> Instead we could duplicate the test in gcc.target/aarch64/sve >with >>> >> appropriate options. >>> >> >>> >> > diff --git a/gcc/tree-vect-data-refs.c >b/gcc/tree-vect-data-refs.c >>> >> > index eb8288e7a85..b30a7d8a3bb 100644 >>> >> > --- a/gcc/tree-vect-data-refs.c >>> >> > +++ b/gcc/tree-vect-data-refs.c >>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment >(loop_vec_info loop_vinfo) >>> >> > { >>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE >(stmt_info) >>> >> > ? vf * DR_GROUP_SIZE >(stmt_info) : vf); >>> >> > - possible_npeel_number >>> >> > - = vect_get_num_vectors (nscalars, vectype); >>> >> > + if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS >(vectype))) >>> >> > + possible_npeel_number = 0; >>> >> > + else >>> >> > + possible_npeel_number >>> >> > + = vect_get_num_vectors (nscalars, vectype); >>> >> > >>> >> > /* NPEEL_TMP is 0 when there is no >misalignment, but also >>> >> > allow peeling NELEMENTS. */ >>> >> >>> >> OK, so this is coming from: >>> >> >>> >> int s[16][2]; >>> >> … >>> >> … =s[j][1]; >>> >> >>> >> and an SLP node containing 16 instances of “s[j][1]”. The >DR_GROUP_SIZE >>> >> is 2 because that's the inner dimension of “s”. >>> >> >>> >> I don't think maybe_lt is right here though. The same problem >could in >>> >> principle happen for cases in which NSCALARS > >TYPE_VECTOR_SUBPARTS, >>> >> e.g. for different inner dimensions of “s”. >>> >> >>> >> I think the testcase shows that using DR_GROUP_SIZE in this >calculation >>> >> is flawed. I'm not sure whether we can really do better given >the current >>> >> representation though. This is one case where having a separate >dr_vec_info >>> >> per SLP node would help. >>> > >>> > I guess what the code likes to know is what we now have in >SLP_TREE_LANES >>> > (or formerly group_size) but that's not necessarily connected to >DR_GROUP_SIZE. >>> > Given we only see a stmt_info here and there's no 1:1 mapping of >SLP node >>> > to stmt_info (and the reverse mapping doesn't even exist) I do not >have >>> > any good idea either. >>> > >>> > Honestly I don't really see what this code tries to do ... doesn't >it >>> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS >(vectype)?! >>> >>> I think it's trying to set possible_npeel_number to the number of >>> vector stmts per iteration (i.e. ncopies for non-SLP stuff): >>> >>> /* For multiple types, it is possible that the bigger >type access >>> will have more than one peeling option. E.g., a >loop with two >>> types: one of size (vector size / 4), and the other >one of >>> size (vector size / 8). Vectorization factor will >8. If both >>> accesses are misaligned by 3, the first one needs >one scalar >>> iteration to be aligned, and the second one needs >5. But the >>> first one will be aligned also by peeling 5 scalar >>> iterations, and in that case both accesses will be >aligned. >>> Hence, except for the immediate peeling amount, we >also want >>> to try to add full vector size, while we don't >exceed >>> vectorization factor. >>> We do this automatically for cost model, since we >calculate >>> cost for every peeling option. */ >>> >>> So for this example, possible_npeel_number==2 for the first access >>> (letting us try two peeling values for it) and >possible_npeel_number==1 >>> for the second. >> >> Yeah, but npeel is _scalar_ lanes, since we always peel scalar >iterations. >> So it seems odd to somehow put in the number of vectors... so to me >> it would have made sense if it did >> >> possible_npeel_number = lower_bound (nscalars); >> >> or whateveris necessary to make the polys happy. Thus simply elide >> the vect_get_num_vectors call? But it's been very long since I've >> dived into the alignment peeling code... > >Ah, I see what you mean. So rather than: > > /* Save info about DR in the hash table. Also include peeling > amounts according to the explanation above. */ > for (j = 0; j < possible_npeel_number; j++) > { > vect_peeling_hash_insert (&peeling_htab, loop_vinfo, > dr_info, npeel_tmp); > npeel_tmp += target_align / dr_size; > } > >just have something like: > > while (known_le (npeel_tmp, nscalars)) > { > … > } > >?
Yeah. Richard.