Dear all,

I've been working on explaining to GCC the cost of loads/stores on my
target and I arrived to this problem. Consider the following code:

    uint64_t sum = 0;
    for(i=0; i<N; i += 2) {  /* N is defined by a macro */
        z0 = buff[i];
        z1 = buff[i+1];
        sum += z0 + z1;
    }

Depending on the type (local/global or parameter of the function) of
buff, I get different code generations for the loop:

For global and local definitions of buff:
$L2:
    ldd r6,8(r10)
    ldd r7,0(r10)
    addi    r10,r10,16
    cmpne   r8,r11,r10
    add r6,r6,r7
    add r9,r9,r6
    bt  r8,$L2

For the parameter, I get this:
$L7
    add r6,r48,r10
    ldd r8,0(r6)
    ldd r7,0(r11)
    addi    r10,r10,16
    cmpine  r6,r10,1024
    addi    r11,r11,16
    add r7,r7,r8
    add r9,r9,r7
    bt  r6,$L7


I don't seem to see why the compiler handles the case of buff as a
parameter to the function differently. It uses 2 registers and fails
to see that it could use the same one with the offset like how it does
it in the global/local cases. Any idea of why this happens to my code
generation?

I wonder now that I look at this if it's an address issue. If you
compare the way it handles the end test, for local and global (where
the compiler has the information of the array), the compare is done
using the end address of the array, whereas this is no longer the case
for the parameter. Instead it uses the number of iterations instead.

I have just now confirmed this by defining the global array as a
pointer or an array (int *tab or int tab[128];). In the case of the
array, I get the solution I would expect. In the case of the pointer,
I get the version that I do not like. Any ideas?

Thank you very much for your help,
Jean Christophe Beyler

Reply via email to