Re: software pipelining

Gan Wed, 08 Dec 2010 09:14:21 -0800

Hi Roy,

I guess SMS didn't pipeline your loop, and the
"prologue" code mentioned in your email is
an iteration peeled off from the loop. It has
nothing to do with prologue code.


I think there are two reasons that can explain why
your code is not pipelined:

1. Alias information is not enough to disambiguate
x and y. x and y are pointers from outside. Currently,
at least in SMS phase, GCC does not know whether
x aliases to y. This may prohibit GCC from pipelining
your loop. As far as I'm aware, alias information from
array data dependence stage is not propagated to SMS,
at least I didn't find in the main trunk. See the last bullet
in "In Progress" section in here:
http://gcc.gnu.org/wiki/SwingModuloScheduling
Andrey, correct me if I'm wrong.

2. GCC does not pipeline loops that contain "auto-inc/post-inc"
operations. See line 1025 and 1039 in modulo-sched.c (gcc-4.5.1).

Please try the codelet below. It works on after you comment out
line 1025 in gcc-4.5.1 and rebuild your compiler.

void foo(void)
{

  int ii, jj, kk;

  int R0,R1,R2,R3;

  for (ii = 1; ii < 12; ii++)
  {
    for (jj = 0; jj < ii; jj++)
 {
      (*((int *) ((char *) R3 + 0))) = R0;
      R3 += 4;
   R0 = (*((int *) ((char *) R2 + 0)));
   R2 = R2+48;
 }
  }

}


I hope this can help you .

Gan


2010/12/8 roy rosen <[email protected]>:
> I have tried to play a bit with SMS on ia64 and I can't understand
> what it is doing.
> It seems that instead of getting some of the first insns out of the
> loop into the prologue it simply gets an entire iteration out of the
> loop and the loop's content stays approximately the same.
>
> For example for
>
> void x(long long*  y, long long* x)
> {
>    int i;
>    for (i = 0; i < 100; i++)
>    {
>        *x = *y;
>        x+=20;y+=30;
>    }
> }
>
> with ./cc1 ./a.c -O3 -fmodulo-sched.
> Can someone show an example where it actually works as it should?
>
> Roy.
>
> 2010/11/10 Andrey Belevantsev <[email protected]>:
>> Hi,
>>
>> On 10.11.2010 12:32, roy rosen wrote:
>>>
>>> Hi,
>>>
>>> I was wondering if gcc has software pipelining.
>>> I saw options -fsel-sched-pipelining -fselective-scheduling
>>> -fselective-scheduling2 but I don't see any pipelining happening
>>> (tried with ia64).
>>> Is there a gcc VLIW port in which I can see it working?
>>
>> You need to try -fmodulo-sched.  Selective scheduling works by default on
>> ia64 with -O3, otherwise you need -fselective-scheduling2
>> -fsel-sched-pipelining.  Note that selective scheduling disables autoinc
>> generation for the pipelining to work, and modulo scheduling will likely
>> refuse to pipeline a loop with autoincs.
>>
>> Modulo scheduling implementation in GCC may be improved, but that's a
>> different topic.
>>
>> Andrey
>>
>>>
>>> For an example function like
>>>
>>> int nor(char* __restrict__ c, char* __restrict__ d)
>>> {
>>>     int i, sum = 0;
>>>     for (i = 0; i<  256; i++)
>>>         d[i] = c[i]<<  3;
>>>     return sum;
>>> }
>>>
>>> with no pipelining a code like
>>>
>>> r1 = 0
>>> r2 = c
>>> r3 = d
>>> _startloop
>>> if r1 == 256 jmp _end
>>> r4 = [r2]+
>>> r4>>= r4
>>> [r3]+ = r4
>>> r1++
>>> jmp _startloop
>>> _end
>>>
>>> here inside the loop there is a data dependency between all 3 insns
>>> (only the r1++ is independent) which does not permit any parallelism
>>>
>>> with pipelining I expect a code like
>>>
>>> r1 = 2
>>> r2 = c
>>> r3 = d
>>> // peel first iteration
>>> r4 = [r2]+
>>> r4>>= r4
>>> r5 = [r2]+
>>> _startloop
>>> if r1 == 256 jmp _end
>>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
>>> r1++
>>> jmp _startloop
>>> _end
>>>
>>> Now the data dependecy is broken and parlallism is possible.
>>> As I said I could not see that happening.
>>> Can someone please tell me on which port and with what options can I
>>> get such a result?
>>>
>>> Thanks, Roy.
>>
>>
>



-- 
Best Regards

Gan

Re: software pipelining

Reply via email to