On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vr...@codesourcery.com> wrote: > Hi Richard, > > I have a patch for PR43864. The patch adds a gimple level duplicate block > cleanup. The patch has been bootstrapped and reg-tested on x86_64, and > reg-tested on ARM. The size impact on ARM for spec2000 is shown in the > following > table (%, lower is better). > > none pic > thumb1 thumb2 thumb1 thumb2 > spec2000 99.9 99.9 99.8 99.8 > > PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that > the > optimizations proposed in PR20070 would fix this PR. > > The problem in this PR is that when compiling with -O2, the example below > should > only have one call to free. The original problem is formulated in terms of > -Os, > but currently we generate one call to free with -Os, although still not the > smallest code possible. I'll show here the -O2 case, since that's similar to > the > original PR. > > #include <stdio.h> > void foo (char*, FILE*); > char* hprofStartupp(char *outputFileName, char *ctx) > { > char fileName[1000]; > FILE *fp; > sprintf(fileName, outputFileName); > if (access(fileName, 1) == 0) { > free(ctx); > return 0; > } > > fp = fopen(fileName, 0); > if (fp == 0) { > free(ctx); > return 0; > } > > foo(outputFileName, fp); > > return ctx; > } > > AFAIU, there are 2 complementary methods of rtl optimizations proposed in > PR20070. > - Merging 2 blocks which are identical expect for input registers, by using a > conditional move to choose between the different input registers. > - Merging 2 blocks which have different local registers, by ignoring those > differences > > Blocks .L6 and.L7 have no difference in local registers, but they have a > difference in input registers: r3 and r1. Replacing the move to r5 by a > conditional move would probably be benificial in terms of size, but it's not > clear what condition the conditional move should be using. Calculating such a > condition would add in size and increase the execution path. > > gcc -O2 -march=armv7-a -mthumb pr43864.c -S: > ... > push {r4, r5, lr} > mov r4, r0 > sub sp, sp, #1004 > mov r5, r1 > mov r0, sp > mov r1, r4 > bl sprintf > mov r0, sp > movs r1, #1 > bl access > mov r3, r0 > cbz r0, .L6 > movs r1, #0 > mov r0, sp > bl fopen > mov r1, r0 > cbz r0, .L7 > mov r0, r4 > bl foo > .L3: > mov r0, r5 > add sp, sp, #1004 > pop {r4, r5, pc} > .L6: > mov r0, r5 > mov r5, r3 > bl free > b .L3 > .L7: > mov r0, r5 > mov r5, r1 > bl free > b .L3 > ... > > The proposed patch solved the problem by dealing with the 2 blocks at a level > when they are still identical: at gimple level. It detect that the 2 blocks > are > identical, and removes one of them. > > The following table shows the impact of the patch on the example in terms of > size for -march=armv7-a: > > without with delta > Os : 108 104 -4 > O2 : 120 104 -16 > Os thumb: 68 64 -4 > O2 thumb: 76 64 -12 > > The gain in size for -O2 is that of removing the entire block, plus the > replacement of 2 moves by a constant set, which also decreases the execution > path. The patch ensures optimal code for both -O2 and -Os. > > > By keeping track of equivalent definitions in the 2 blocks, we can ignore > those > differences in comparison. Without this feature, we would only match blocks > with > resultless operations, due the the ssa-nature of gimples. > For example, with this feature, we reduce the following function to its > minimum > at gimple level, rather than at rtl level. > > int f(int c, int b, int d) > { > int r, e; > > if (c) > r = b + d; > else > { > e = b + d; > r = e; > } > > return r; > } > > ;; Function f (f) > > f (int c, int b, int d) > { > int e; > > <bb 2>: > e_6 = b_3(D) + d_4(D); > return e_6; > > } > > I'll send the patch with the testcases in a separate email. > > OK for trunk?
I don't like that you hook this into cleanup_tree_cfg - that is called _way_ too often. This also duplicates the literal matching done on the RTL level - instead I think this optimization would be more related to value-numbering (either that of SCCVN/FRE/PRE or that of DOM which also does jump-threading). Richard. > Thanks, > - Tom > > 2011-06-08 Tom de Vries <t...@codesourcery.com> > > PR middle-end/43864 > * tree-cfgcleanup.c (int_int_splay_lookup, int_int_splay_insert) > (int_int_splay_node_contained_in, int_int_splay_contained_in) > (equiv_lookup, equiv_insert, equiv_contained_in, equiv_init) > (equiv_delete, gimple_base_equal_p, pt_solution_equal_p, > gimple_equal_p) > (bb_gimple_equal_p, update_debug_stmts, cleanup_duplicate_preds_1) > (same_or_local_phi_alternatives, cleanup_duplicate_preds): New > function. > (cleanup_tree_cfg_bb): Use cleanup_duplicate_preds. >