On Wed, 12 Mar 2014, Cong Hou wrote:

> Thank you for pointing it out. I didn't realized that alias analysis
> has influences on this issue.
> 
> The current problem is that the epilogue may be unnecessary if the
> loop bound cannot be larger than the number of iterations of the
> vectorized loop multiplied by VF when the vectorized loop is supposed
> to be executed. My method is incorrect because I assume the vectorized
> loop will be executed which is actually guaranteed by loop bound check
> (and also alias checks). So if the alias checks exist, my method is
> fine as both conditions are met.

But there is still the loop bound check which, if it fails, uses
the epilogue loop as fallback, not the scalar versioned loop.

> If there is no alias checks, I must
> consider the possibility that the vectorized loop may not be executed
> at runtime and then the epilogue should not be eliminated. The warning
> appears on epilogue, and with loop bound checks (and without alias
> checks) the warning will be gone. So I think the key is alias checks:
> my method only works if there is no alias checks.
> 
> How about adding one more condition that checks if alias checks are
> needed, as the code shown below?
> 
>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
> 
> 
> thanks,
> Cong
> 
> 
> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> >> This patch is fixing PR60505 in which the vectorizer may produce
> >> unnecessary epilogues.
> >>
> >> Bootstrapped and tested on a x86_64 machine.
> >>
> >> OK for trunk?
> >
> > That looks wrong.  Consider the case where the loop isn't versioned,
> > if you disable generation of the epilogue loop, you end up only with
> > a vector loop.
> >
> > Say:
> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> > void
> > foo (char *__restrict in, char *__restrict out, int num)
> > {
> >   int i;
> >
> >   in = __builtin_assume_aligned (in, 16);
> >   out = __builtin_assume_aligned (out, 16);
> >   for (i = 0; i < num; ++i)
> >     out[i] = (ovec[i] = in[i]);
> >   out[num] = ovec[num / 2];
> > }
> > -O2 -ftree-vectorize.  Now, consider if this function is called
> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> > valid and your patch will cause a wrong-code in this case).
> >
> >         Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Reply via email to