Hi,

On Mon, 18 Dec 2017, Richard Biener wrote:

> where *unroll is similar to *max_vf I think.  dist_v[0] is the innermost loop.

[0] is always outermost loop.

> The vectorizer does way more complicated things and only looks at the 
> distance with respect to the outer loop as far as I can see which can be 
> negative.
> 
> Not sure if fusion and vectorizer "interleaving" makes a difference 
> here. I think the idea was that when interleaving stmt-by-stmt then 
> forward dependences would be preserved and thus we don't need to check 
> the inner loop dependences.  speaking with "forward vs. backward" 
> dependences again, not distances...
> 
> This also means that unroll-and-jam could be enhanced to "interleave" 
> stmts and thus cover more cases?
> 
> Again, I hope Micha can have a look here...

Haven't yet looked at the patch, but some comments anyway:

fusion and interleaving interact in the following way in outer loop 
vectorization, conceptually:
* (1) the outer loop is unrolled
* (2) the inner loops are fused
* (3) the (now single) inner body is rescheduled/shuffled/interleaved.

(1) is always okay.  But (2) and (3) as individual transformations must 
both be checked for validity.  If fusion isn't possible the whole 
transformation is invalid, and if interleaving isn't possible the same is 
true.  In the specific example:

  for (b = 4; b >= 0; b--)
    for (c = 0; c <= 6; c++)
      t = a[c][b + 1];      // S1
      a[c + 1][b + 2] = t;  // S2

it's already the fusion step that's invalid.  There's a 
dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0) 
with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5].  So a 
write-after-read.  After fusing:

   for (c = 0; c <= 6; c++)
     {
       t = a[c][5];              // S1
       a[c + 1][6] = t;
       t = a[c][4];
       a[c + 1][5] = t;          // S2
       a[c + 1][4] = a[c][3];
       a[c + 1][3] = a[c][2];
     }

here we have at iterations (c) = (0) comes-before (1), at S2(0) writing 
a[1][5] and S1(1) writing a[1][5].  I.e. now it's a read-after-write (the 
write in iteration 0 overwrites the value that is going to be read at 
iteration 1, which wasn't the case in the original loop).  The dependence 
switched direction --> invalid.

The simple interleaving of statements can't rectify this.  
Interleaving is an inner-body reordering but the brokenness comes from a 
cross-iteration ordering.

This example can be unroll-jammed or outer-loop vectorized if one of the 
two loops is reversed.  Let's say we reverse the inner loop, so that it 
runs in the same direction as the outer loop (reversal is possible here).

It'd then be something like:

   for (c = 6; c >= 0; c--)
     {
       t = a[c][5];              // S1
       a[c + 1][6] = t;
       t = a[c][4];
       a[c + 1][5] = t;          // S2
       a[c + 1][4] = a[c][3];
       a[c + 1][3] = a[c][2];
     }

The dependence between S1/S2 would still be a write-after-read, and all 
would be well.  This reversal of the inner loop can partly be simulated by 
not only interleaving the inner insns, but by also _reodering_ them.  But 
AFAIK the vectorizer doesn't do this?


Ciao,
Michael.

Reply via email to