On Thu, 13 Mar 2014, Cong Hou wrote:

> On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguent...@suse.de> wrote:
> > On Wed, 12 Mar 2014, Cong Hou wrote:
> >
> >> Thank you for pointing it out. I didn't realized that alias analysis
> >> has influences on this issue.
> >>
> >> The current problem is that the epilogue may be unnecessary if the
> >> loop bound cannot be larger than the number of iterations of the
> >> vectorized loop multiplied by VF when the vectorized loop is supposed
> >> to be executed. My method is incorrect because I assume the vectorized
> >> loop will be executed which is actually guaranteed by loop bound check
> >> (and also alias checks). So if the alias checks exist, my method is
> >> fine as both conditions are met.
> >
> > But there is still the loop bound check which, if it fails, uses
> > the epilogue loop as fallback, not the scalar versioned loop.
> 
> The loop bound check is already performed together with alias checks
> (assume we need alias checks). Actually, I did observe that the loop
> bound check in the true body of alias checks may be unnecessary. For
> example, for the following loop
> 
>   for(i=0; i < num ; ++i)
>     out[i] = (ovec[i] = in[i]);
> 
> GCC now generates the following GIMPLE code after vectorization:
> 
> 
> 
>   <bb 3>: // loop bound check (with cost model) and alias checks
>   _29 = (unsigned int) num_5(D);
>   _28 = _29 > 15;
>   _24 = in_9(D) + 16;
>   _23 = out_7(D) >= _24;
>   _2 = out_7(D) + 16;
>   _1 = _2 <= in_9(D);
>   _32 = _1 | _23;
>   _31 = _28 & _32;
>   if (_31 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 12>;
> 
>   <bb 4>:
>   niters.3_44 = (unsigned int) num_5(D);
>   _46 = niters.3_44 + 4294967280;
>   _47 = _46 >> 4;
>   bnd.4_45 = _47 + 1;
>   ratio_mult_vf.5_48 = bnd.4_45 << 4;
>   _59 = (unsigned int) num_5(D);
>   _60 = _59 + 4294967295;
>   if (_60 <= 14)                      <================ is this necessary?
>     goto <bb 10>;
>   else
>     goto <bb 5>;
> 
> 
> The check _60<=14 should be unnecessary because it is implied by the
> fact _29 > 15 in bb3.

I agree that the code the vectorizer generates is less than optimal
in some cases (I filed a PR about this some time ago and fixed some
cases).

> Consider this fact and if there are alias checks, we can safely remove
> the epilogue if the maximum trip count of the loop is less than or
> equal to the calculated threshold.

You have to consider n % vf != 0, so an argument on only maximum
trip count or threshold cannot work.

Richard.

> 
> Cong
> 
> 
> 
> >
> >> If there is no alias checks, I must
> >> consider the possibility that the vectorized loop may not be executed
> >> at runtime and then the epilogue should not be eliminated. The warning
> >> appears on epilogue, and with loop bound checks (and without alias
> >> checks) the warning will be gone. So I think the key is alias checks:
> >> my method only works if there is no alias checks.
> >>
> >> How about adding one more condition that checks if alias checks are
> >> needed, as the code shown below?
> >>
> >>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> >>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> >>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> >>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
> >>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
> >>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
> >>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
> >>
> >>
> >> thanks,
> >> Cong
> >>
> >>
> >> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> >> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> >> >> This patch is fixing PR60505 in which the vectorizer may produce
> >> >> unnecessary epilogues.
> >> >>
> >> >> Bootstrapped and tested on a x86_64 machine.
> >> >>
> >> >> OK for trunk?
> >> >
> >> > That looks wrong.  Consider the case where the loop isn't versioned,
> >> > if you disable generation of the epilogue loop, you end up only with
> >> > a vector loop.
> >> >
> >> > Say:
> >> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> >> > void
> >> > foo (char *__restrict in, char *__restrict out, int num)
> >> > {
> >> >   int i;
> >> >
> >> >   in = __builtin_assume_aligned (in, 16);
> >> >   out = __builtin_assume_aligned (out, 16);
> >> >   for (i = 0; i < num; ++i)
> >> >     out[i] = (ovec[i] = in[i]);
> >> >   out[num] = ovec[num / 2];
> >> > }
> >> > -O2 -ftree-vectorize.  Now, consider if this function is called
> >> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> >> > valid and your patch will cause a wrong-code in this case).
> >> >
> >> >         Jakub
> >>
> >>
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE / SUSE Labs
> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Reply via email to