Hi Richard,

It turned out that the patch proposed by you does not work properly
for nested loop. If we slightly change pr70729.cc to
(non-essential part is omitted
void inline Ss::foo (float w)
{
#pragma omp simd
  for (int i=0; i<S_n; i++)
    {
      float w1 = C2[S_n + i] * w;
      v1.v_i[i] += (int)w1;
      C1[S_n + i] += w1;
    }
}
void Ss::boo (int n)
{
  for (int i = 0; i<n; i++)
    foo(ww[i]);
}
loop in foo won't be vectorized since REF_LOOP is outer loop which is
not marked with omp simd pragma:
 t1.cc:73:25: note: not vectorized: not suitable for scatter store *_31 = _33;
t1.cc:73:25: note: bad data references.

Note that the check I proposed before works fine.

2016-07-20 12:35 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>:
> Richard,
>
> Jakub has already fixed it.
> Sorry for troubles.
> Yuri.
>
> 2016-07-19 18:29 GMT+03:00 Renlin Li <renlin...@foss.arm.com>:
>> Hi Yuri,
>>
>> I saw this test case runs on arm platforms, and maybe other platforms as
>> well.
>>
>> testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such
>> file or directory
>>
>> Before the change here, it's gated by vect_simd_clones target selector,
>> which limit it to i?86/x86_64 platform only.
>>
>> Regards,
>> Renlin Li
>>
>>
>>
>>
>> On 08/07/16 15:07, Yuri Rumyantsev wrote:
>>>
>>> Hi Richard,
>>>
>>> Thanks for your help - your patch looks much better.
>>> Here is new patch in which additional argument was added to determine
>>> source loop of reference.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>> ChangeLog:
>>> 2016-07-08  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>
>>> PR tree-optimization/71734
>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
>>> contains REF, use it to check safelen, assume that safelen value
>>> must be greater 1, fix style.
>>> (ref_indep_loop_p_2): Add REF_LOOP argument.
>>> (ref_indep_loop_p): Pass LOOP as additional argument to
>>> ref_indep_loop_p_2.
>>> gcc/testsuite/ChangeLog:
>>>          * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.
>>>
>>> 2016-07-08 11:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>
>>>> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev <ysrum...@gmail.com>
>>>> wrote:
>>>>>
>>>>> I checked simd3.f90 and found out that my additional check reject
>>>>> independence of references
>>>>>
>>>>> REF is independent in loop#3
>>>>> .istart0.19, .iend0.20
>>>>> which are defined in loop#1 which is outer for loop#3.
>>>>> Note that these references are defined by
>>>>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
>>>>> which is in loop#1.
>>>>> It is clear that both these references can not be independent for
>>>>> loop#3.
>>>>
>>>>
>>>> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner
>>>> loops
>>>> of LOOP to catch memory references in those as well.  So the issue is
>>>> really
>>>> that we look at the wrong loop for safelen and we _do_ want to apply
>>>> safelen
>>>> to inner loops as well.
>>>>
>>>> So better track the loop we are ultimately asking the question for, like
>>>> in the
>>>> attached patch (fixes the testcase for me).
>>>>
>>>> Richard.
>>>>
>>>>
>>>>
>>>>> 2016-07-07 17:11 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev <ysrum...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I Added this check because of new failures in libgomp.fortran suite.
>>>>>>> Here is copy of Jakub message:
>>>>>>> --- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>>>>>>> The #c27 r237844 change looks bogus to me.
>>>>>>> First of all, IMNSHO you can argue this way only if ref is a reference
>>>>>>> seen in
>>>>>>> loop LOOP,
>>>>>>
>>>>>>
>>>>>> or inner loops of LOOP I guess.  I _think_ we never call
>>>>>> ref_indep_loop_p_1 with
>>>>>> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would
>>>>>> not make
>>>>>> sense to do that, it would be a waste of time).
>>>>>>
>>>>>> So only if "or inner loops of LOOP" is not correct the check would be
>>>>>> needed
>>>>>> but then my issue with unrolling an inner loop and turning a ref that
>>>>>> safelen
>>>>>> does not apply to into a ref that it now applies to arises.
>>>>>>
>>>>>> I don't fully get what Jakub is hinting at.
>>>>>>
>>>>>> Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can
>>>>>> you
>>>>>> explain that bitmap check with a simple testcase?
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2
>>>>>>> -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p
>>>>>>> - the
>>>>>>> D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the
>>>>>>> outer loop
>>>>>>> obviously can be dependent on many of the loads and/or stores in the
>>>>>>> loop, be
>>>>>>> it "omp simd array" or not.
>>>>>>> Say for
>>>>>>> void
>>>>>>> foo (int *p, int *q)
>>>>>>> {
>>>>>>>    #pragma omp simd
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      p[i] += q[0];
>>>>>>> }
>>>>>>> sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could
>>>>>>> write
>>>>>>> something that changes its value, and then it would behave differently
>>>>>>> from
>>>>>>> using VF = 1024, where everything is performed in parallel.
>>>>>>> Though, actually, it can alias, just it would have to write the same
>>>>>>> value as
>>>>>>> was there.  So, if this is used to determine if it is safe to hoist
>>>>>>> the load
>>>>>>> before the loop, it is fine, if it is used to determine if &q[0] >=
>>>>>>> &p[0] &&
>>>>>>> &q[0] <= &p[1023], then it is not fine.
>>>>>>>
>>>>>>> For aliasing of q[0] and p[1023], I don't see why they couldn't alias
>>>>>>> in a
>>>>>>> valid program.  #pragma omp simd I think guarantees that the last
>>>>>>> iteration is
>>>>>>> executed last, it isn't necessarily executed last alone, it could be,
>>>>>>> or
>>>>>>> together with one before last iteration, or (for simdlen INT_MAX) even
>>>>>>> all
>>>>>>> iterations can be done concurrently, in hw or sw, so it is fine if it
>>>>>>> is
>>>>>>> transformed into:
>>>>>>>    int temp[1024], temp2[1024], temp3[1024];
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      temp[i] = p[i];
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      temp2[i] = q[0];
>>>>>>>    /* The above two loops can be also swapped, or intermixed.  */
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      temp3[i] = temp[i] + temp2[i];
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      p[i] = temp3[i];
>>>>>>>    /* Or the above loop reversed etc. */
>>>>>>>
>>>>>>> If you have:
>>>>>>> int
>>>>>>> bar (int *p, int *q)
>>>>>>> {
>>>>>>>    q[0] = 0;
>>>>>>>    #pragma omp simd
>>>>>>>    for (int i = 0; i < 1024; i++)
>>>>>>>      p[i]++;
>>>>>>>    return q[0];
>>>>>>> }
>>>>>>> i.e. something similar to what misbehaves in simd3.f90 with the
>>>>>>> change, then
>>>>>>> the answer is that q[0] isn't guaranteed to be independent of any
>>>>>>> references in
>>>>>>> the simd loop.
>>>>>>>
>>>>>>> 2016-07-07 16:57 GMT+03:00 Richard Biener
>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 3:24 PM, Yuri Rumyantsev <ysrum...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Did you meas the following check enhancement:
>>>>>>>>>
>>>>>>>>>    outer = loop_outer (loop);
>>>>>>>>>    /* We consider REF defined in LOOP as independent if at least 2
>>>>>>>>> loop
>>>>>>>>>       iterations are not dependent.  */
>>>>>>>>>    if ((loop->safelen > 1
>>>>>>>>>         || (outer && outer->safelen > 1))
>>>>>>>>>        && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num],
>>>>>>>>> ref->id))
>>>>>>>>>      return true;
>>>>>>>>> which allows us to consider reference x[0] as invariant for the test
>>>>>>>>> #pragma omp simd
>>>>>>>>>    for (i = 0; i< 100; i++)
>>>>>>>>>      {
>>>>>>>>>        y[i] = i * 2;
>>>>>>>>>        for (j = 0; j < 100; j++)
>>>>>>>>> z[j] += x[0];
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> Moving statement
>>>>>>>>> _9 = *x_19(D);
>>>>>>>>> (cost 20) out of loop 1.
>>>>>>>>
>>>>>>>>
>>>>>>>> outer loops will be automatically considered by outermost_indep_loop
>>>>>>>> which eventually
>>>>>>>> calls the function you are patching.
>>>>>>>>
>>>>>>>> I was questioning the added test
>>>>>>>>
>>>>>>>>         && bitmap_bit_p (&memory_accesses.refs_in_loop[loop->num],
>>>>>>>> ref->id))
>>>>>>>>
>>>>>>>> and still do not understand why you need that.  It makes us only
>>>>>>>> consider the
>>>>>>>> loop of ref itself when looking at safelen (but not refs that belong
>>>>>>>> to inner loops
>>>>>>>> of loop).
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2016-07-07 11:46 GMT+03:00 Richard Biener
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 6, 2016 at 4:42 PM, Yuri Rumyantsev
>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I don't understand your question and how it is related to proposed
>>>>>>>>>>> patch.
>>>>>>>>>>>
>>>>>>>>>>> 2016-07-06 16:55 GMT+03:00 Richard Biener
>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 6, 2016 at 3:50 PM, Yuri Rumyantsev
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> See for example Jakub comments for 70729.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> But how can the check be valid given a
>>>>>>>>>>>>
>>>>>>>>>>>> #pragma omp simd
>>>>>>>>>>>>    for (;;)
>>>>>>>>>>>>      {
>>>>>>>>>>>>         for (;;)
>>>>>>>>>>>>           ...ref in question...
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>> can be transformed to
>>>>>>>>>>>>
>>>>>>>>>>>> #pragma omp simd
>>>>>>>>>>>>     for (;;)
>>>>>>>>>>>>       {
>>>>>>>>>>>>          ...ref in question...
>>>>>>>>>>>>          ...ref in question..
>>>>>>>>>>>>          ...
>>>>>>>>>>>>       }
>>>>>>>>>>>>
>>>>>>>>>>>> by means of unrolling?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The bitmap check would return false for the not unrolled inner loop
>>>>>>>>>> and true for the inner unrolled loop.
>>>>>>>>>> So it cannot be correct.
>>>>>>>>>>
>>>>>>>>>> (not sure why this is all off-list btw)
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2016-07-06 16:25 GMT+03:00 Richard Biener
>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:43 PM, Yuri Rumyantsev
>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did not learn LIM detailed but I see on the following
>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>> #pragma omp simd
>>>>>>>>>>>>>>>    for (i = 0; i< 100; i++)
>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>>        y[i] = i * 2;
>>>>>>>>>>>>>>>        for (j = 0; j < 100; j++)
>>>>>>>>>>>>>>> z[j] += x[0];
>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>> that only inner,ost loop is handled:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Memory reference in loop#2 1: *_7
>>>>>>>>>>>>>>> Memory reference in loop#2 2: *x_19(D)
>>>>>>>>>>>>>>> Memory reference in loop#1 3: *_3
>>>>>>>>>>>>>>> Basic block 3 (loop 1 -- depth 1):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Basic block 4 (loop 2 -- depth 2):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Loop#2 is innermost
>>>>>>>>>>>>>>> Querying dependency of refs 2 and 1: dependent.
>>>>>>>>>>>>>>> Querying dependencies of ref 2 in loop 2: dependent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> well, LIM doesn't bother to check against refs in an outer loop
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> refs in the inner loop are already dependent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Loop unroll of inner loop does not invalidate 'safelen' of
>>>>>>>>>>>>>>> outer one
>>>>>>>>>>>>>>> accordingly with safelen definition.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So why do you need the bitmap_bit_p check then?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2016-07-06 14:23 GMT+03:00 Richard Biener
>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 1:17 PM, Yuri Rumyantsev
>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> LIM phase considers only innermost loops and does not handle
>>>>>>>>>>>>>>>>> loop
>>>>>>>>>>>>>>>>> nests, so in your case the innermost loop (j) does not have
>>>>>>>>>>>>>>>>> non-zero
>>>>>>>>>>>>>>>>> safelen.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Huh?  LIM considers loop nests just fine.  And yes, the
>>>>>>>>>>>>>>>> innermost loop
>>>>>>>>>>>>>>>> does not have safelen but shouldn't it?  Consider the inner
>>>>>>>>>>>>>>>> loop being unrolled
>>>>>>>>>>>>>>>> by GCC so it is removed - should that then invalidate the
>>>>>>>>>>>>>>>> outer loop safelen
>>>>>>>>>>>>>>>> because innermost loop refs now appear in the outer loop?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2016-07-06 13:34 GMT+03:00 Richard Biener
>>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Jul 6, 2016 at 11:50 AM, Yuri Rumyantsev
>>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I pointed out in the commentary that REF is defined inside
>>>>>>>>>>>>>>>>>>> loop and
>>>>>>>>>>>>>>>>>>> this check is related to this statement. Should I clarify
>>>>>>>>>>>>>>>>>>> it?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +  /* We consider REF defined in LOOP as independent if at
>>>>>>>>>>>>>>>>>>> least 2 loop
>>>>>>>>>>>>>>>>>>> +     iterations are not dependent.  */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, I fail to see why x[0] should not be disambiguated
>>>>>>>>>>>>>>>>>> against y[i] in
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> #pragma GCC loop ivdep
>>>>>>>>>>>>>>>>>>    for (i...)
>>>>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>>>>>        y[i] = ...;
>>>>>>>>>>>>>>>>>>        for (j...)
>>>>>>>>>>>>>>>>>>          ... = x[0];
>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> REF is always inside the loop nest with LOOP being the
>>>>>>>>>>>>>>>>>> outermost loop.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2016-07-06 12:38 GMT+03:00 Richard Biener
>>>>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev
>>>>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is a simple fix to cure regressions introduced by
>>>>>>>>>>>>>>>>>>>>> my fix for
>>>>>>>>>>>>>>>>>>>>> 70729. Patch also contains minor changes in test found
>>>>>>>>>>>>>>>>>>>>> by Jakub.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Bootstrapping and regression testing did not show any
>>>>>>>>>>>>>>>>>>>>> new failures.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +      && bitmap_bit_p
>>>>>>>>>>>>>>>>>>>> (&memory_accesses.refs_in_loop[loop->num], ref->id))
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So safelen does not apply to refs in nested loops?  The
>>>>>>>>>>>>>>>>>>>> added comment only
>>>>>>>>>>>>>>>>>>>> explains the safelen check change but this also requires
>>>>>>>>>>>>>>>>>>>> explanation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>> 2016-07-05  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> PR tree-optimization/71734
>>>>>>>>>>>>>>>>>>>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF
>>>>>>>>>>>>>>>>>>>>> defined in
>>>>>>>>>>>>>>>>>>>>> LOOP as independent if at least two loop iterations are
>>>>>>>>>>>>>>>>>>>>> not dependent.
>>>>>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>>>>>>>>>>          * g++.dg/vect/pr70729.cc: Delete redundant dg
>>>>>>>>>>>>>>>>>>>>> options, fix style.

Attachment: t1.cc
Description: Binary data

Reply via email to