On Wed, Jun 8, 2011 at 11:55 AM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vr...@codesourcery.com> wrote: >> Hi Richard, >> >> I have a patch for PR43864. The patch adds a gimple level duplicate block >> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and >> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the >> following >> table (%, lower is better). >> >> none pic >> thumb1 thumb2 thumb1 thumb2 >> spec2000 99.9 99.9 99.8 99.8 >> >> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that >> the >> optimizations proposed in PR20070 would fix this PR. >> >> The problem in this PR is that when compiling with -O2, the example below >> should >> only have one call to free. The original problem is formulated in terms of >> -Os, >> but currently we generate one call to free with -Os, although still not the >> smallest code possible. I'll show here the -O2 case, since that's similar to >> the >> original PR. >> >> #include <stdio.h> >> void foo (char*, FILE*); >> char* hprofStartupp(char *outputFileName, char *ctx) >> { >> char fileName[1000]; >> FILE *fp; >> sprintf(fileName, outputFileName); >> if (access(fileName, 1) == 0) { >> free(ctx); >> return 0; >> } >> >> fp = fopen(fileName, 0); >> if (fp == 0) { >> free(ctx); >> return 0; >> } >> >> foo(outputFileName, fp); >> >> return ctx; >> } >> >> AFAIU, there are 2 complementary methods of rtl optimizations proposed in >> PR20070. >> - Merging 2 blocks which are identical expect for input registers, by using a >> conditional move to choose between the different input registers. >> - Merging 2 blocks which have different local registers, by ignoring those >> differences >> >> Blocks .L6 and.L7 have no difference in local registers, but they have a >> difference in input registers: r3 and r1. Replacing the move to r5 by a >> conditional move would probably be benificial in terms of size, but it's not >> clear what condition the conditional move should be using. Calculating such a >> condition would add in size and increase the execution path. >> >> gcc -O2 -march=armv7-a -mthumb pr43864.c -S: >> ... >> push {r4, r5, lr} >> mov r4, r0 >> sub sp, sp, #1004 >> mov r5, r1 >> mov r0, sp >> mov r1, r4 >> bl sprintf >> mov r0, sp >> movs r1, #1 >> bl access >> mov r3, r0 >> cbz r0, .L6 >> movs r1, #0 >> mov r0, sp >> bl fopen >> mov r1, r0 >> cbz r0, .L7 >> mov r0, r4 >> bl foo >> .L3: >> mov r0, r5 >> add sp, sp, #1004 >> pop {r4, r5, pc} >> .L6: >> mov r0, r5 >> mov r5, r3 >> bl free >> b .L3 >> .L7: >> mov r0, r5 >> mov r5, r1 >> bl free >> b .L3 >> ... >> >> The proposed patch solved the problem by dealing with the 2 blocks at a level >> when they are still identical: at gimple level. It detect that the 2 blocks >> are >> identical, and removes one of them. >> >> The following table shows the impact of the patch on the example in terms of >> size for -march=armv7-a: >> >> without with delta >> Os : 108 104 -4 >> O2 : 120 104 -16 >> Os thumb: 68 64 -4 >> O2 thumb: 76 64 -12 >> >> The gain in size for -O2 is that of removing the entire block, plus the >> replacement of 2 moves by a constant set, which also decreases the execution >> path. The patch ensures optimal code for both -O2 and -Os. >> >> >> By keeping track of equivalent definitions in the 2 blocks, we can ignore >> those >> differences in comparison. Without this feature, we would only match blocks >> with >> resultless operations, due the the ssa-nature of gimples. >> For example, with this feature, we reduce the following function to its >> minimum >> at gimple level, rather than at rtl level. >> >> int f(int c, int b, int d) >> { >> int r, e; >> >> if (c) >> r = b + d; >> else >> { >> e = b + d; >> r = e; >> } >> >> return r; >> } >> >> ;; Function f (f) >> >> f (int c, int b, int d) >> { >> int e; >> >> <bb 2>: >> e_6 = b_3(D) + d_4(D); >> return e_6; >> >> } >> >> I'll send the patch with the testcases in a separate email. >> >> OK for trunk? > > I don't like that you hook this into cleanup_tree_cfg - that is called > _way_ too often. > > This also duplicates the literal matching done on the RTL level - instead > I think this optimization would be more related to value-numbering > (either that of SCCVN/FRE/PRE or that of DOM which also does > jump-threading).
And while at it: Put it in a separate file ;-) Ciao! Steven