Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult

Klaus Pedersen Sun, 03 Jun 2012 10:17:50 -0700

On Tue, May 29, 2012 at 6:55 AM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> On 05/28/2012 03:09 PM, Richard Sandiford wrote:
>>
>> Klaus Pedersen<proje...@gmail.com>  writes:
>>>
>>> The summery goes something like this:
>>>
>>> It is possible for the second pass of ira to get confused and decide that
>>> NO_REGS or a hard float register are better choices for the result of the
>>> 2 operand mult. First pass already optimally allocated in
>>> GR_AND_MD1_REGS.
>>
>> Yeah.  I'm afraid this is something I've been sitting on for a while now.
>>
>> I think the only practical way of calculating accurate costs for things
>> like GR_AND_MD_REGS really is to count the cost of the constituent classes
>> and then take their MAX.
>>
>> Vlad, what do you think?  Is the above exclude_p code "just" a
>> compile-time
>> speed-up?
>
> Yes, I think so.  Every cost pass is very expensive and practically
> proportional to  number of classes in consideration.
>
> Probably, exluding some classes was a bad solution to speed IRA up.  Or may
> be we need the pressure classes calculation improvements.  As I remember I
> tried long ago to calculate IRA cover classes automatically and it did not
> work.  Pressure classes calculation is analgous to the cover classes
> calculation but it is less critical for register pressure sensitive insn
> scheduling.


As a test, I tried to search all: ira-exhausive-search.patch

--- gcc-4.7-20120526-orig/gcc/ira-costs.c       2012-06-03 19:01:00.861129575 
+0800
+++ gcc-4.7-20120526/gcc/ira-costs.c    2012-06-03 19:01:16.854081473 +0800
@@ -258,7 +258,7 @@ setup_regno_cost_classes_by_aclass (int
       for (i = 0; i < ira_important_classes_num; i++)
        {
          cl = ira_important_classes[i];
-         if (exclude_p)
+
            {
              /* Exclude no-pressure classes which are subsets of
                 ACLASS.  */

This didn't make any difference to the output (at least not with -mips1 and
-O2). Probably my patch is not doing the right thing!

My tree is around 1500 files, which gcc compiles into 3515246 lines of assembly.

Next I disabled second pass with: ira-no_2nd_pass.patch

--- gcc-4.7-20120526-orig/gcc/ira-costs.c       2012-06-03 19:01:00.861129575 
+0800
+++ gcc-4.7-20120526/gcc/ira-costs.c    2012-06-03 19:05:45.054289701 +0800
@@ -1537,7 +1537,8 @@ find_costs_and_classes (FILE *dump_file)
      use for each allocno.  However, if -fexpensive-optimizations are
      on, we do so twice, the second time using the tentative best
      classes to guide the selection.  */
-    for (pass = start; pass <= flag_expensive_optimizations; pass++)
+
+    pass = start;
     {
       if ((!allocno_p || internal_flag_ira_verbose > 0) && dump_file)
        fprintf (dump_file,

This improved things a lot. 280 files changed and many had improved.

My original fix, that use sane cost for the ACC_REGS: gpr_acc_cost_3.patch

--- gcc-4.7-20120526-orig/gcc/config/mips/mips.c        2012-06-03
19:28:02.137960837 +0800
+++ gcc-4.7-20120526/gcc/config/mips/mips.c     2012-06-03 19:31:12.587399458 
+0800
@@ -11258,7 +11258,7 @@ mips_move_to_gpr_cost (enum machine_mode

     case ACC_REGS:
       /* MFLO and MFHI.  */
-      return 6;
+      return 3;

     case FP_REGS:
       /* MFC1, etc.  */
@@ -11294,7 +11294,7 @@ mips_move_from_gpr_cost (enum machine_mo

     case ACC_REGS:
       /* MTLO and MTHI.  */
-      return 6;
+      return 3;

     case FP_REGS:
       /* MTC1, etc.  */

This also improved things a lot. Again around 280 files changed. This got
the number of generated assembly down to 3513371 (almost 2000 lines better
that the original)

Funny enough combining the two patches didn't bring any benefits (actually
700 lines worse)

The files below had the biggest changes (in last two col's):
                        Orig    no2ndpa grpcost gprcost no2ndpass
xc_surface.o.s          27158   26940   26700   -458    -218
xc_adpcm.o.s            4137    3994    4004    -133    -143
xc_blit_1.o.s           4199    4199    4101    -98     +0
xc_ts_calib.o.s         276     245     245     -31     -31
xc_camera.o.s           3349    3363    3321    -28     +14
xc_glyph.o.s            10048   10203   10027   -21     +155
xc_blit_A.o.s           10991   10851   10992   +1      -140
xc_events.o.s           7141    7141    7160    +19     +0
xc_miarc.o.s            16172   16317   16235   +63     +145

As can be seen - there is no clear pattern, except gpr_acc_cost_3.patch
does better. But there are some cases where it is worse (the xc_miarc
file), xc_blit_A is also interesting as ira-no_2nd_pass.patch improved a
lot.


Generally problems look like:
        mflo    $14
        sw      $14,4($sp)
        lw      $15,4($sp)
        mult    $2,$10
(the copy in 4($sp) is never used)

Strangely the 19 additional lines in xc_Events.o.s is all caused by bogus
moves. Strangely because the patch only changes the cost of ACC_REGS,
which is not used:
        addiu   $4,$4,%lo(.LC0)
        jal     dbg_AssertUtil
        sw      $2,48($sp)
        lw      $2,48($sp)
(the copy in 48($sp) is never used)


Later I will try to extract some test cases.


BR, Klaus


>
>
>>   Or is it a conceptual part of the algorithm?
>
> No.
>
>>   More generally,
>> what kind of situations does the second pass help with?
>
> I can not show such situations right now but I did some benchmarking long
> ago on the old RA and the second pass is really important for better code
> generation.  That time I even thought about 3rd pass for -O3.  I don't think
> the situation is now different.
>
> Cost pass is a complicated part.  It is impossible to find some good
> literature which could help.  The problem is in GCC compiler specifics when
> code selection is not done (at least fully) before RA and we don't know
> until reload end what  alternative will be used.  So some code selection is
> done in combiner, some in IRA (including cost pass by defining allocation
> classes for pseudos) and final code selection is done in reload.
>
> I thought about different approaches to cost pass but failed to find a
> better heuristic approach.  There is an optimal solution to problem being
> solved by cost pass but it requires ILP and that is not practical because of
> its slowness.
>
>>   I.e. how does
>> it improve on the first pass?
>>
>>
> Richard thanks for detail analysis.  I'll try to do some benchmarks to
> measure compilation time.  If the slowdown is tolerable,  I exclude the code
> you mentioned (excluding subsets of pressure classes) and submit the patch.
>

Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult

Reply via email to