Hi Roy, I guess SMS didn't pipeline your loop, and the "prologue" code mentioned in your email is an iteration peeled off from the loop. It has nothing to do with prologue code.
I think there are two reasons that can explain why your code is not pipelined: 1. Alias information is not enough to disambiguate x and y. x and y are pointers from outside. Currently, at least in SMS phase, GCC does not know whether x aliases to y. This may prohibit GCC from pipelining your loop. As far as I'm aware, alias information from array data dependence stage is not propagated to SMS, at least I didn't find in the main trunk. See the last bullet in "In Progress" section in here: http://gcc.gnu.org/wiki/SwingModuloScheduling Andrey, correct me if I'm wrong. 2. GCC does not pipeline loops that contain "auto-inc/post-inc" operations. See line 1025 and 1039 in modulo-sched.c (gcc-4.5.1). Please try the codelet below. It works on after you comment out line 1025 in gcc-4.5.1 and rebuild your compiler. void foo(void) { int ii, jj, kk; int R0,R1,R2,R3; for (ii = 1; ii < 12; ii++) { for (jj = 0; jj < ii; jj++) { (*((int *) ((char *) R3 + 0))) = R0; R3 += 4; R0 = (*((int *) ((char *) R2 + 0))); R2 = R2+48; } } } I hope this can help you . Gan 2010/12/8 roy rosen <[email protected]>: > I have tried to play a bit with SMS on ia64 and I can't understand > what it is doing. > It seems that instead of getting some of the first insns out of the > loop into the prologue it simply gets an entire iteration out of the > loop and the loop's content stays approximately the same. > > For example for > > void x(long long* y, long long* x) > { > int i; > for (i = 0; i < 100; i++) > { > *x = *y; > x+=20;y+=30; > } > } > > with ./cc1 ./a.c -O3 -fmodulo-sched. > Can someone show an example where it actually works as it should? > > Roy. > > 2010/11/10 Andrey Belevantsev <[email protected]>: >> Hi, >> >> On 10.11.2010 12:32, roy rosen wrote: >>> >>> Hi, >>> >>> I was wondering if gcc has software pipelining. >>> I saw options -fsel-sched-pipelining -fselective-scheduling >>> -fselective-scheduling2 but I don't see any pipelining happening >>> (tried with ia64). >>> Is there a gcc VLIW port in which I can see it working? >> >> You need to try -fmodulo-sched. Selective scheduling works by default on >> ia64 with -O3, otherwise you need -fselective-scheduling2 >> -fsel-sched-pipelining. Note that selective scheduling disables autoinc >> generation for the pipelining to work, and modulo scheduling will likely >> refuse to pipeline a loop with autoincs. >> >> Modulo scheduling implementation in GCC may be improved, but that's a >> different topic. >> >> Andrey >> >>> >>> For an example function like >>> >>> int nor(char* __restrict__ c, char* __restrict__ d) >>> { >>> int i, sum = 0; >>> for (i = 0; i< 256; i++) >>> d[i] = c[i]<< 3; >>> return sum; >>> } >>> >>> with no pipelining a code like >>> >>> r1 = 0 >>> r2 = c >>> r3 = d >>> _startloop >>> if r1 == 256 jmp _end >>> r4 = [r2]+ >>> r4>>= r4 >>> [r3]+ = r4 >>> r1++ >>> jmp _startloop >>> _end >>> >>> here inside the loop there is a data dependency between all 3 insns >>> (only the r1++ is independent) which does not permit any parallelism >>> >>> with pipelining I expect a code like >>> >>> r1 = 2 >>> r2 = c >>> r3 = d >>> // peel first iteration >>> r4 = [r2]+ >>> r4>>= r4 >>> r5 = [r2]+ >>> _startloop >>> if r1 == 256 jmp _end >>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+ >>> r1++ >>> jmp _startloop >>> _end >>> >>> Now the data dependecy is broken and parlallism is possible. >>> As I said I could not see that happening. >>> Can someone please tell me on which port and with what options can I >>> get such a result? >>> >>> Thanks, Roy. >> >> > -- Best Regards Gan
