On Thu, 13 Mar 2014, Cong Hou wrote: > On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 12 Mar 2014, Cong Hou wrote: > > > >> Thank you for pointing it out. I didn't realized that alias analysis > >> has influences on this issue. > >> > >> The current problem is that the epilogue may be unnecessary if the > >> loop bound cannot be larger than the number of iterations of the > >> vectorized loop multiplied by VF when the vectorized loop is supposed > >> to be executed. My method is incorrect because I assume the vectorized > >> loop will be executed which is actually guaranteed by loop bound check > >> (and also alias checks). So if the alias checks exist, my method is > >> fine as both conditions are met. > > > > But there is still the loop bound check which, if it fails, uses > > the epilogue loop as fallback, not the scalar versioned loop. > > The loop bound check is already performed together with alias checks > (assume we need alias checks). Actually, I did observe that the loop > bound check in the true body of alias checks may be unnecessary. For > example, for the following loop > > for(i=0; i < num ; ++i) > out[i] = (ovec[i] = in[i]); > > GCC now generates the following GIMPLE code after vectorization: > > > > <bb 3>: // loop bound check (with cost model) and alias checks > _29 = (unsigned int) num_5(D); > _28 = _29 > 15; > _24 = in_9(D) + 16; > _23 = out_7(D) >= _24; > _2 = out_7(D) + 16; > _1 = _2 <= in_9(D); > _32 = _1 | _23; > _31 = _28 & _32; > if (_31 != 0) > goto <bb 4>; > else > goto <bb 12>; > > <bb 4>: > niters.3_44 = (unsigned int) num_5(D); > _46 = niters.3_44 + 4294967280; > _47 = _46 >> 4; > bnd.4_45 = _47 + 1; > ratio_mult_vf.5_48 = bnd.4_45 << 4; > _59 = (unsigned int) num_5(D); > _60 = _59 + 4294967295; > if (_60 <= 14) <================ is this necessary? > goto <bb 10>; > else > goto <bb 5>; > > > The check _60<=14 should be unnecessary because it is implied by the > fact _29 > 15 in bb3.
I agree that the code the vectorizer generates is less than optimal in some cases (I filed a PR about this some time ago and fixed some cases). > Consider this fact and if there are alias checks, we can safely remove > the epilogue if the maximum trip count of the loop is less than or > equal to the calculated threshold. You have to consider n % vf != 0, so an argument on only maximum trip count or threshold cannot work. Richard. > > Cong > > > > > > >> If there is no alias checks, I must > >> consider the possibility that the vectorized loop may not be executed > >> at runtime and then the epilogue should not be eliminated. The warning > >> appears on epilogue, and with loop bound checks (and without alias > >> checks) the warning will be gone. So I think the key is alias checks: > >> my method only works if there is no alias checks. > >> > >> How about adding one more condition that checks if alias checks are > >> needed, as the code shown below? > >> > >> else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > >> || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > >> < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > >> && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) > >> || (unsigned HOST_WIDE_INT)max_stmt_executions_int > >> (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) > >> LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; > >> > >> > >> thanks, > >> Cong > >> > >> > >> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote: > >> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > >> >> This patch is fixing PR60505 in which the vectorizer may produce > >> >> unnecessary epilogues. > >> >> > >> >> Bootstrapped and tested on a x86_64 machine. > >> >> > >> >> OK for trunk? > >> > > >> > That looks wrong. Consider the case where the loop isn't versioned, > >> > if you disable generation of the epilogue loop, you end up only with > >> > a vector loop. > >> > > >> > Say: > >> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > >> > void > >> > foo (char *__restrict in, char *__restrict out, int num) > >> > { > >> > int i; > >> > > >> > in = __builtin_assume_aligned (in, 16); > >> > out = __builtin_assume_aligned (out, 16); > >> > for (i = 0; i < num; ++i) > >> > out[i] = (ovec[i] = in[i]); > >> > out[num] = ovec[num / 2]; > >> > } > >> > -O2 -ftree-vectorize. Now, consider if this function is called > >> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > >> > valid and your patch will cause a wrong-code in this case). > >> > > >> > Jakub > >> > >> > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE / SUSE Labs > > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer