Ah ok, so I can see why it would not be able to perform that
optimization around the loop but I changed the code to simply have
this:
uint64_t foo (void)
{
return data[0] + data[1] + data[2];
}
And this generates :
la r9,data
la r7,data+8
ldd r6,0(r7)
ldd r8,0(r9)
ldd r7,16(r9)
I'm trying to see if there is a problem with my rtx costs function
because again, I don't understand why it would generate 2 la instead
of using an offset of 8 and 16.
Thanks for any input,
Jc
On Wed, Jul 15, 2009 at 1:29 AM, Paolo Bonzini<[email protected]> wrote:
>
>> As you can see, the compiler uses r9 to store data and then uses that
>> for data[0] but also loads in r7 data+8 instead of directly using r9.
>> If I remove the loop then it does not do this.
>
> This optimization is done by CSE only, currently. That's why it cannot look
> through loops.
>
> Paolo
>