On 02/13/2013 11:24 PM, Edgar E. Iglesias wrote:
On Thu, Feb 14, 2013 at 12:36:46AM +0100, Michael Eager wrote:
On 02/13/2013 02:38 PM, Vladimir Makarov wrote:
On 13-02-13 1:36 AM, Michael Eager wrote:
Hi --

I'm seeing register allocation problems and code size increases
with gcc-4.6.2 (and gcc-head) compared with older (gcc-4.1.2).
Both are compiled using -O3.

One test case that I have has a long series of nested if's
each with the same comparison and similar computation.

         if (n<max_no){
           n+=*(cp-*p++);
           if (n<max_no){
             n+=*(cp-*p);
               if (n<max_no){
         . . .          ~20 levels of nesting
                <more computations with 'cp' and 'p'>
                 . . . }}}

Gcc-4.6.2 generates many blocks like the following:
     lwi    r28,r1,68    -- load into dead reg
     lwi    r31,r1,140    -- load p from stack
     lbui    r28,r31,0
     rsubk    r31,r28,r19
     lbui    r31,r31,0
     addk    r29,r29,r31
     swi    r31,r1,308
     lwi    r31,r1,428    -- load of max_no from stack
     cmp    r28,r31,r29    -- n in r29
     bgeid    r28,$L46

gcc-4.1.2 generates the following:
     lbui    r3,r26,3
     rsubk    r3,r3,r19
     lbui    r3,r3,0
     addk    r30,r30,r3
     swi    r3,r1,80
     cmp    r18,r9,r30    -- max_no in r9, n in r30
     bgei    r18,$L6

gcc-4.6.2 (and gcc-head) load max_no from the stack in each block.
There also are extra loads into r28 (which is not used) and r31 at
the start of each block.  Only r28, r29, and r31 are used.

I'm having a hard time telling what is happening or why.  The
IRA dump has this line:
    Ignoring reg 772, has equiv memory
where pseudo 772 is loaded with max_no early in the function.

The reload dump has
Reloads for insn # 254
Reload 0: reload_in (SI) = (reg/v:SI 722 [ max_no ])
     GR_REGS, RELOAD_FOR_INPUT (opnum = 1)
     reload_in_reg: (reg/v:SI 722 [ max_no ])
     reload_reg_rtx: (reg:SI 31 r31)
and similar for each of the other insns using 722.

This is followed by
   Spilling for insn 254.
   Using reg 31 for reload 0
for each insn using pseudo 722.

Any idea what is going on?

So many changes happened since then (7 years ago), that it is very hard to me 
to say something
definitely.  I also have no gcc-4.1 microblaze (as I see microblaze was added 
to public gcc for 4.6
version) and it makes me even more difficult to say something useful.

First of all, the new RA was introduced in gcc4.4 (IRA) which uses different 
heuristics
(Chaitin-Briggs graph coloring vs Chow's priority RA).

We could blame IRA when we have the same started conditions for it RA gcc4.1 
and gcc4.6-gcc-4.8.
But I am sure it is not the same. More aggressive optimizations creates higher 
register pressure.  I
compared peak reg pressure in the test for gcc4.6 and gcc4.8.  It became higher 
(from 102 to 106).
I guess the increase was even bigger since gcc4.1.

I thought about register pressure causing this, but I think that should cause
spilling of one of the registers which were not used in this long sequence,
rather than causing a large number of additional loads.

Perhaps the cost analysis has a problem.

RA focused on generation of faster code.  Looking at the fragment you provided 
it, it is hard to say
something about it.  I tried -Os for gcc4.8 and it generates desirable code for 
the fragment in
question (by the way the peak register pressure decreased to 66 in this case).

It's both larger and slower, since the additional loads take much longer.  I'll 
take a
look at -Os.

It looks like the values of p++ are being pre-calculated and stored on the 
stack.  This results in
a load, rather than an increment of a register.

Hi,

I remember having a similar issue about a year ago. IIRC, I foudn that
the ivopts pass was transforming things badly for microblaze. Disabling
it helped alot.

I can't tell if you are seeing the same thing, but it might be worth
trying -fno-ivopts in case you haven't already.

Thanks.  I'll see if that helps.


--
Michael Eager    [email protected]
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Reply via email to