On Thu, Feb 14, 2013 at 12:36:46AM +0100, Michael Eager wrote:
> On 02/13/2013 02:38 PM, Vladimir Makarov wrote:
> > On 13-02-13 1:36 AM, Michael Eager wrote:
> >> Hi --
> >>
> >> I'm seeing register allocation problems and code size increases
> >> with gcc-4.6.2 (and gcc-head) compared with older (gcc-4.1.2).
> >> Both are compiled using -O3.
> >>
> >> One test case that I have has a long series of nested if's
> >> each with the same comparison and similar computation.
> >>
> >> if (n<max_no){
> >> n+=*(cp-*p++);
> >> if (n<max_no){
> >> n+=*(cp-*p);
> >> if (n<max_no){
> >> . . . ~20 levels of nesting
> >> <more computations with 'cp' and 'p'>
> >> . . . }}}
> >>
> >> Gcc-4.6.2 generates many blocks like the following:
> >> lwi r28,r1,68 -- load into dead reg
> >> lwi r31,r1,140 -- load p from stack
> >> lbui r28,r31,0
> >> rsubk r31,r28,r19
> >> lbui r31,r31,0
> >> addk r29,r29,r31
> >> swi r31,r1,308
> >> lwi r31,r1,428 -- load of max_no from stack
> >> cmp r28,r31,r29 -- n in r29
> >> bgeid r28,$L46
> >>
> >> gcc-4.1.2 generates the following:
> >> lbui r3,r26,3
> >> rsubk r3,r3,r19
> >> lbui r3,r3,0
> >> addk r30,r30,r3
> >> swi r3,r1,80
> >> cmp r18,r9,r30 -- max_no in r9, n in r30
> >> bgei r18,$L6
> >>
> >> gcc-4.6.2 (and gcc-head) load max_no from the stack in each block.
> >> There also are extra loads into r28 (which is not used) and r31 at
> >> the start of each block. Only r28, r29, and r31 are used.
> >>
> >> I'm having a hard time telling what is happening or why. The
> >> IRA dump has this line:
> >> Ignoring reg 772, has equiv memory
> >> where pseudo 772 is loaded with max_no early in the function.
> >>
> >> The reload dump has
> >> Reloads for insn # 254
> >> Reload 0: reload_in (SI) = (reg/v:SI 722 [ max_no ])
> >> GR_REGS, RELOAD_FOR_INPUT (opnum = 1)
> >> reload_in_reg: (reg/v:SI 722 [ max_no ])
> >> reload_reg_rtx: (reg:SI 31 r31)
> >> and similar for each of the other insns using 722.
> >>
> >> This is followed by
> >> Spilling for insn 254.
> >> Using reg 31 for reload 0
> >> for each insn using pseudo 722.
> >>
> >> Any idea what is going on?
> >>
> > So many changes happened since then (7 years ago), that it is very hard to
> > me to say something
> > definitely. I also have no gcc-4.1 microblaze (as I see microblaze was
> > added to public gcc for 4.6
> > version) and it makes me even more difficult to say something useful.
> >
> > First of all, the new RA was introduced in gcc4.4 (IRA) which uses
> > different heuristics
> > (Chaitin-Briggs graph coloring vs Chow's priority RA).
> >
> > We could blame IRA when we have the same started conditions for it RA
> > gcc4.1 and gcc4.6-gcc-4.8.
> > But I am sure it is not the same. More aggressive optimizations creates
> > higher register pressure. I
> > compared peak reg pressure in the test for gcc4.6 and gcc4.8. It became
> > higher (from 102 to 106).
> > I guess the increase was even bigger since gcc4.1.
>
> I thought about register pressure causing this, but I think that should cause
> spilling of one of the registers which were not used in this long sequence,
> rather than causing a large number of additional loads.
>
> Perhaps the cost analysis has a problem.
>
> > RA focused on generation of faster code. Looking at the fragment you
> > provided it, it is hard to say
> > something about it. I tried -Os for gcc4.8 and it generates desirable code
> > for the fragment in
> > question (by the way the peak register pressure decreased to 66 in this
> > case).
>
> It's both larger and slower, since the additional loads take much longer.
> I'll take a
> look at -Os.
>
> It looks like the values of p++ are being pre-calculated and stored on the
> stack. This results in
> a load, rather than an increment of a register.
Hi,
I remember having a similar issue about a year ago. IIRC, I foudn that
the ivopts pass was transforming things badly for microblaze. Disabling
it helped alot.
I can't tell if you are seeing the same thing, but it might be worth
trying -fno-ivopts in case you haven't already.
Cheers,
Edgar