Hi Richard, It turned out that the patch proposed by you does not work properly for nested loop. If we slightly change pr70729.cc to (non-essential part is omitted void inline Ss::foo (float w) { #pragma omp simd for (int i=0; i<S_n; i++) { float w1 = C2[S_n + i] * w; v1.v_i[i] += (int)w1; C1[S_n + i] += w1; } } void Ss::boo (int n) { for (int i = 0; i<n; i++) foo(ww[i]); } loop in foo won't be vectorized since REF_LOOP is outer loop which is not marked with omp simd pragma: t1.cc:73:25: note: not vectorized: not suitable for scatter store *_31 = _33; t1.cc:73:25: note: bad data references.
Note that the check I proposed before works fine. 2016-07-20 12:35 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: > Richard, > > Jakub has already fixed it. > Sorry for troubles. > Yuri. > > 2016-07-19 18:29 GMT+03:00 Renlin Li <renlin...@foss.arm.com>: >> Hi Yuri, >> >> I saw this test case runs on arm platforms, and maybe other platforms as >> well. >> >> testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such >> file or directory >> >> Before the change here, it's gated by vect_simd_clones target selector, >> which limit it to i?86/x86_64 platform only. >> >> Regards, >> Renlin Li >> >> >> >> >> On 08/07/16 15:07, Yuri Rumyantsev wrote: >>> >>> Hi Richard, >>> >>> Thanks for your help - your patch looks much better. >>> Here is new patch in which additional argument was added to determine >>> source loop of reference. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> ChangeLog: >>> 2016-07-08 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> PR tree-optimization/71734 >>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which >>> contains REF, use it to check safelen, assume that safelen value >>> must be greater 1, fix style. >>> (ref_indep_loop_p_2): Add REF_LOOP argument. >>> (ref_indep_loop_p): Pass LOOP as additional argument to >>> ref_indep_loop_p_2. >>> gcc/testsuite/ChangeLog: >>> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style. >>> >>> 2016-07-08 11:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> >>>> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> >>>>> I checked simd3.f90 and found out that my additional check reject >>>>> independence of references >>>>> >>>>> REF is independent in loop#3 >>>>> .istart0.19, .iend0.20 >>>>> which are defined in loop#1 which is outer for loop#3. >>>>> Note that these references are defined by >>>>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20); >>>>> which is in loop#1. >>>>> It is clear that both these references can not be independent for >>>>> loop#3. >>>> >>>> >>>> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner >>>> loops >>>> of LOOP to catch memory references in those as well. So the issue is >>>> really >>>> that we look at the wrong loop for safelen and we _do_ want to apply >>>> safelen >>>> to inner loops as well. >>>> >>>> So better track the loop we are ultimately asking the question for, like >>>> in the >>>> attached patch (fixes the testcase for me). >>>> >>>> Richard. >>>> >>>> >>>> >>>>> 2016-07-07 17:11 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> >>>>>> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> I Added this check because of new failures in libgomp.fortran suite. >>>>>>> Here is copy of Jakub message: >>>>>>> --- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >>>>>>> The #c27 r237844 change looks bogus to me. >>>>>>> First of all, IMNSHO you can argue this way only if ref is a reference >>>>>>> seen in >>>>>>> loop LOOP, >>>>>> >>>>>> >>>>>> or inner loops of LOOP I guess. I _think_ we never call >>>>>> ref_indep_loop_p_1 with >>>>>> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would >>>>>> not make >>>>>> sense to do that, it would be a waste of time). >>>>>> >>>>>> So only if "or inner loops of LOOP" is not correct the check would be >>>>>> needed >>>>>> but then my issue with unrolling an inner loop and turning a ref that >>>>>> safelen >>>>>> does not apply to into a ref that it now applies to arises. >>>>>> >>>>>> I don't fully get what Jakub is hinting at. >>>>>> >>>>>> Can you install the safelen > 0 -> safelen > 1 fix please? Jakub, can >>>>>> you >>>>>> explain that bitmap check with a simple testcase? >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2 >>>>>>> -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p >>>>>>> - the >>>>>>> D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the >>>>>>> outer loop >>>>>>> obviously can be dependent on many of the loads and/or stores in the >>>>>>> loop, be >>>>>>> it "omp simd array" or not. >>>>>>> Say for >>>>>>> void >>>>>>> foo (int *p, int *q) >>>>>>> { >>>>>>> #pragma omp simd >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> p[i] += q[0]; >>>>>>> } >>>>>>> sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could >>>>>>> write >>>>>>> something that changes its value, and then it would behave differently >>>>>>> from >>>>>>> using VF = 1024, where everything is performed in parallel. >>>>>>> Though, actually, it can alias, just it would have to write the same >>>>>>> value as >>>>>>> was there. So, if this is used to determine if it is safe to hoist >>>>>>> the load >>>>>>> before the loop, it is fine, if it is used to determine if &q[0] >= >>>>>>> &p[0] && >>>>>>> &q[0] <= &p[1023], then it is not fine. >>>>>>> >>>>>>> For aliasing of q[0] and p[1023], I don't see why they couldn't alias >>>>>>> in a >>>>>>> valid program. #pragma omp simd I think guarantees that the last >>>>>>> iteration is >>>>>>> executed last, it isn't necessarily executed last alone, it could be, >>>>>>> or >>>>>>> together with one before last iteration, or (for simdlen INT_MAX) even >>>>>>> all >>>>>>> iterations can be done concurrently, in hw or sw, so it is fine if it >>>>>>> is >>>>>>> transformed into: >>>>>>> int temp[1024], temp2[1024], temp3[1024]; >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> temp[i] = p[i]; >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> temp2[i] = q[0]; >>>>>>> /* The above two loops can be also swapped, or intermixed. */ >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> temp3[i] = temp[i] + temp2[i]; >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> p[i] = temp3[i]; >>>>>>> /* Or the above loop reversed etc. */ >>>>>>> >>>>>>> If you have: >>>>>>> int >>>>>>> bar (int *p, int *q) >>>>>>> { >>>>>>> q[0] = 0; >>>>>>> #pragma omp simd >>>>>>> for (int i = 0; i < 1024; i++) >>>>>>> p[i]++; >>>>>>> return q[0]; >>>>>>> } >>>>>>> i.e. something similar to what misbehaves in simd3.f90 with the >>>>>>> change, then >>>>>>> the answer is that q[0] isn't guaranteed to be independent of any >>>>>>> references in >>>>>>> the simd loop. >>>>>>> >>>>>>> 2016-07-07 16:57 GMT+03:00 Richard Biener >>>>>>> <richard.guent...@gmail.com>: >>>>>>>> >>>>>>>> On Thu, Jul 7, 2016 at 3:24 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Did you meas the following check enhancement: >>>>>>>>> >>>>>>>>> outer = loop_outer (loop); >>>>>>>>> /* We consider REF defined in LOOP as independent if at least 2 >>>>>>>>> loop >>>>>>>>> iterations are not dependent. */ >>>>>>>>> if ((loop->safelen > 1 >>>>>>>>> || (outer && outer->safelen > 1)) >>>>>>>>> && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num], >>>>>>>>> ref->id)) >>>>>>>>> return true; >>>>>>>>> which allows us to consider reference x[0] as invariant for the test >>>>>>>>> #pragma omp simd >>>>>>>>> for (i = 0; i< 100; i++) >>>>>>>>> { >>>>>>>>> y[i] = i * 2; >>>>>>>>> for (j = 0; j < 100; j++) >>>>>>>>> z[j] += x[0]; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Moving statement >>>>>>>>> _9 = *x_19(D); >>>>>>>>> (cost 20) out of loop 1. >>>>>>>> >>>>>>>> >>>>>>>> outer loops will be automatically considered by outermost_indep_loop >>>>>>>> which eventually >>>>>>>> calls the function you are patching. >>>>>>>> >>>>>>>> I was questioning the added test >>>>>>>> >>>>>>>> && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num], >>>>>>>> ref->id)) >>>>>>>> >>>>>>>> and still do not understand why you need that. It makes us only >>>>>>>> consider the >>>>>>>> loop of ref itself when looking at safelen (but not refs that belong >>>>>>>> to inner loops >>>>>>>> of loop). >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>> >>>>>>>>> 2016-07-07 11:46 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> >>>>>>>>>> On Wed, Jul 6, 2016 at 4:42 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> I don't understand your question and how it is related to proposed >>>>>>>>>>> patch. >>>>>>>>>>> >>>>>>>>>>> 2016-07-06 16:55 GMT+03:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 6, 2016 at 3:50 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> See for example Jakub comments for 70729. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> But how can the check be valid given a >>>>>>>>>>>> >>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>> for (;;) >>>>>>>>>>>> { >>>>>>>>>>>> for (;;) >>>>>>>>>>>> ...ref in question... >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> can be transformed to >>>>>>>>>>>> >>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>> for (;;) >>>>>>>>>>>> { >>>>>>>>>>>> ...ref in question... >>>>>>>>>>>> ...ref in question.. >>>>>>>>>>>> ... >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> by means of unrolling? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The bitmap check would return false for the not unrolled inner loop >>>>>>>>>> and true for the inner unrolled loop. >>>>>>>>>> So it cannot be correct. >>>>>>>>>> >>>>>>>>>> (not sure why this is all off-list btw) >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> 2016-07-06 16:25 GMT+03:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:43 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did not learn LIM detailed but I see on the following >>>>>>>>>>>>>>> example >>>>>>>>>>>>>>> #pragma omp simd >>>>>>>>>>>>>>> for (i = 0; i< 100; i++) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> y[i] = i * 2; >>>>>>>>>>>>>>> for (j = 0; j < 100; j++) >>>>>>>>>>>>>>> z[j] += x[0]; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> that only inner,ost loop is handled: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Memory reference in loop#2 1: *_7 >>>>>>>>>>>>>>> Memory reference in loop#2 2: *x_19(D) >>>>>>>>>>>>>>> Memory reference in loop#1 3: *_3 >>>>>>>>>>>>>>> Basic block 3 (loop 1 -- depth 1): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Basic block 4 (loop 2 -- depth 2): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Loop#2 is innermost >>>>>>>>>>>>>>> Querying dependency of refs 2 and 1: dependent. >>>>>>>>>>>>>>> Querying dependencies of ref 2 in loop 2: dependent >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> well, LIM doesn't bother to check against refs in an outer loop >>>>>>>>>>>>>> if >>>>>>>>>>>>>> refs in the inner loop are already dependent. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Loop unroll of inner loop does not invalidate 'safelen' of >>>>>>>>>>>>>>> outer one >>>>>>>>>>>>>>> accordingly with safelen definition. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> So why do you need the bitmap_bit_p check then? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2016-07-06 14:23 GMT+03:00 Richard Biener >>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:17 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> LIM phase considers only innermost loops and does not handle >>>>>>>>>>>>>>>>> loop >>>>>>>>>>>>>>>>> nests, so in your case the innermost loop (j) does not have >>>>>>>>>>>>>>>>> non-zero >>>>>>>>>>>>>>>>> safelen. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Huh? LIM considers loop nests just fine. And yes, the >>>>>>>>>>>>>>>> innermost loop >>>>>>>>>>>>>>>> does not have safelen but shouldn't it? Consider the inner >>>>>>>>>>>>>>>> loop being unrolled >>>>>>>>>>>>>>>> by GCC so it is removed - should that then invalidate the >>>>>>>>>>>>>>>> outer loop safelen >>>>>>>>>>>>>>>> because innermost loop refs now appear in the outer loop? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2016-07-06 13:34 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 11:50 AM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I pointed out in the commentary that REF is defined inside >>>>>>>>>>>>>>>>>>> loop and >>>>>>>>>>>>>>>>>>> this check is related to this statement. Should I clarify >>>>>>>>>>>>>>>>>>> it? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> + /* We consider REF defined in LOOP as independent if at >>>>>>>>>>>>>>>>>>> least 2 loop >>>>>>>>>>>>>>>>>>> + iterations are not dependent. */ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes, I fail to see why x[0] should not be disambiguated >>>>>>>>>>>>>>>>>> against y[i] in >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> #pragma GCC loop ivdep >>>>>>>>>>>>>>>>>> for (i...) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> y[i] = ...; >>>>>>>>>>>>>>>>>> for (j...) >>>>>>>>>>>>>>>>>> ... = x[0]; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> REF is always inside the loop nest with LOOP being the >>>>>>>>>>>>>>>>>> outermost loop. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2016-07-06 12:38 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Here is a simple fix to cure regressions introduced by >>>>>>>>>>>>>>>>>>>>> my fix for >>>>>>>>>>>>>>>>>>>>> 70729. Patch also contains minor changes in test found >>>>>>>>>>>>>>>>>>>>> by Jakub. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Bootstrapping and regression testing did not show any >>>>>>>>>>>>>>>>>>>>> new failures. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> + && bitmap_bit_p >>>>>>>>>>>>>>>>>>>> (&memory_accesses.refs_in_loop[loop->num], ref->id)) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> So safelen does not apply to refs in nested loops? The >>>>>>>>>>>>>>>>>>>> added comment only >>>>>>>>>>>>>>>>>>>> explains the safelen check change but this also requires >>>>>>>>>>>>>>>>>>>> explanation. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>>>>> 2016-07-05 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> PR tree-optimization/71734 >>>>>>>>>>>>>>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF >>>>>>>>>>>>>>>>>>>>> defined in >>>>>>>>>>>>>>>>>>>>> LOOP as independent if at least two loop iterations are >>>>>>>>>>>>>>>>>>>>> not dependent. >>>>>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>>>>>>>>> * g++.dg/vect/pr70729.cc: Delete redundant dg >>>>>>>>>>>>>>>>>>>>> options, fix style.
t1.cc
Description: Binary data