On Tue, May 29, 2012 at 6:55 AM, Vladimir Makarov <vmaka...@redhat.com> wrote: > On 05/28/2012 03:09 PM, Richard Sandiford wrote: >> >> Klaus Pedersen<proje...@gmail.com> writes: >>> >>> The summery goes something like this: >>> >>> It is possible for the second pass of ira to get confused and decide that >>> NO_REGS or a hard float register are better choices for the result of the >>> 2 operand mult. First pass already optimally allocated in >>> GR_AND_MD1_REGS. >> >> Yeah. I'm afraid this is something I've been sitting on for a while now. >> >> I think the only practical way of calculating accurate costs for things >> like GR_AND_MD_REGS really is to count the cost of the constituent classes >> and then take their MAX. >> >> Vlad, what do you think? Is the above exclude_p code "just" a >> compile-time >> speed-up? > > Yes, I think so. Every cost pass is very expensive and practically > proportional to number of classes in consideration. > > Probably, exluding some classes was a bad solution to speed IRA up. Or may > be we need the pressure classes calculation improvements. As I remember I > tried long ago to calculate IRA cover classes automatically and it did not > work. Pressure classes calculation is analgous to the cover classes > calculation but it is less critical for register pressure sensitive insn > scheduling.
As a test, I tried to search all: ira-exhausive-search.patch --- gcc-4.7-20120526-orig/gcc/ira-costs.c 2012-06-03 19:01:00.861129575 +0800 +++ gcc-4.7-20120526/gcc/ira-costs.c 2012-06-03 19:01:16.854081473 +0800 @@ -258,7 +258,7 @@ setup_regno_cost_classes_by_aclass (int for (i = 0; i < ira_important_classes_num; i++) { cl = ira_important_classes[i]; - if (exclude_p) + { /* Exclude no-pressure classes which are subsets of ACLASS. */ This didn't make any difference to the output (at least not with -mips1 and -O2). Probably my patch is not doing the right thing! My tree is around 1500 files, which gcc compiles into 3515246 lines of assembly. Next I disabled second pass with: ira-no_2nd_pass.patch --- gcc-4.7-20120526-orig/gcc/ira-costs.c 2012-06-03 19:01:00.861129575 +0800 +++ gcc-4.7-20120526/gcc/ira-costs.c 2012-06-03 19:05:45.054289701 +0800 @@ -1537,7 +1537,8 @@ find_costs_and_classes (FILE *dump_file) use for each allocno. However, if -fexpensive-optimizations are on, we do so twice, the second time using the tentative best classes to guide the selection. */ - for (pass = start; pass <= flag_expensive_optimizations; pass++) + + pass = start; { if ((!allocno_p || internal_flag_ira_verbose > 0) && dump_file) fprintf (dump_file, This improved things a lot. 280 files changed and many had improved. My original fix, that use sane cost for the ACC_REGS: gpr_acc_cost_3.patch --- gcc-4.7-20120526-orig/gcc/config/mips/mips.c 2012-06-03 19:28:02.137960837 +0800 +++ gcc-4.7-20120526/gcc/config/mips/mips.c 2012-06-03 19:31:12.587399458 +0800 @@ -11258,7 +11258,7 @@ mips_move_to_gpr_cost (enum machine_mode case ACC_REGS: /* MFLO and MFHI. */ - return 6; + return 3; case FP_REGS: /* MFC1, etc. */ @@ -11294,7 +11294,7 @@ mips_move_from_gpr_cost (enum machine_mo case ACC_REGS: /* MTLO and MTHI. */ - return 6; + return 3; case FP_REGS: /* MTC1, etc. */ This also improved things a lot. Again around 280 files changed. This got the number of generated assembly down to 3513371 (almost 2000 lines better that the original) Funny enough combining the two patches didn't bring any benefits (actually 700 lines worse) The files below had the biggest changes (in last two col's): Orig no2ndpa grpcost gprcost no2ndpass xc_surface.o.s 27158 26940 26700 -458 -218 xc_adpcm.o.s 4137 3994 4004 -133 -143 xc_blit_1.o.s 4199 4199 4101 -98 +0 xc_ts_calib.o.s 276 245 245 -31 -31 xc_camera.o.s 3349 3363 3321 -28 +14 xc_glyph.o.s 10048 10203 10027 -21 +155 xc_blit_A.o.s 10991 10851 10992 +1 -140 xc_events.o.s 7141 7141 7160 +19 +0 xc_miarc.o.s 16172 16317 16235 +63 +145 As can be seen - there is no clear pattern, except gpr_acc_cost_3.patch does better. But there are some cases where it is worse (the xc_miarc file), xc_blit_A is also interesting as ira-no_2nd_pass.patch improved a lot. Generally problems look like: mflo $14 sw $14,4($sp) lw $15,4($sp) mult $2,$10 (the copy in 4($sp) is never used) Strangely the 19 additional lines in xc_Events.o.s is all caused by bogus moves. Strangely because the patch only changes the cost of ACC_REGS, which is not used: addiu $4,$4,%lo(.LC0) jal dbg_AssertUtil sw $2,48($sp) lw $2,48($sp) (the copy in 48($sp) is never used) Later I will try to extract some test cases. BR, Klaus > > >> Or is it a conceptual part of the algorithm? > > No. > >> More generally, >> what kind of situations does the second pass help with? > > I can not show such situations right now but I did some benchmarking long > ago on the old RA and the second pass is really important for better code > generation. That time I even thought about 3rd pass for -O3. I don't think > the situation is now different. > > Cost pass is a complicated part. It is impossible to find some good > literature which could help. The problem is in GCC compiler specifics when > code selection is not done (at least fully) before RA and we don't know > until reload end what alternative will be used. So some code selection is > done in combiner, some in IRA (including cost pass by defining allocation > classes for pseudos) and final code selection is done in reload. > > I thought about different approaches to cost pass but failed to find a > better heuristic approach. There is an optimal solution to problem being > solved by cost pass but it requires ILP and that is not practical because of > its slowness. > >> I.e. how does >> it improve on the first pass? >> >> > Richard thanks for detail analysis. I'll try to do some benchmarks to > measure compilation time. If the slowdown is tolerable, I exclude the code > you mentioned (excluding subsets of pressure classes) and submit the patch. >