On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov <[email protected]> wrote:
> On 18/04/2012, at 9:17 PM, Richard Guenther wrote:
>
>> On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov <[email protected]>
>> wrote:
>>> Steven,
>>> J"orn,
>>>
>>> I am looking into fixing performance regression on EEMBC's bitmnp01, and a
>>> version of your combined patch attached to PR38785 still works very well.
>>> Would you mind me getting it through upstream review, or are there any
>>> issues with contributing this patch to GCC mainline?
>>>
>>> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since
>>> GCC 4.4, and it didn't show any performance or correctness problems on x86,
>>> ARM, MIPS, and other architectures. It also reliably fixes bitmnp01
>>> regression, which is still present in current mainline.
>>>
>>> I have tested this patch on recent mainline on i686-linux-gnu with no
>>> regressions. Unless I hear from you to the contrary, I will push this
>>> patch for upstream review and, hopefully, get it checked in.
>>>
>>> Previous discussion of this patch is at
>>> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html
>>
>> The addition of -ftree-pre-partial-partial is ok if you change its name to
>> -ftree-partial-pre and add documentation to invoke.texi.
>
> OK. Gerald, does the patch for gcc-4.8/changes.html look OK?
>
>>
>> + /* Assuming the expression is 50% anticipatable, we have
>> + to multiply the number of insertions needed by two for a
>> cost
>> + comparison. */
>>
>> why assume 50% anticipatibility if you can compute the exact
>> anticipatibility?
>
> Do you mean partial anticipatibility as in the updated patch? To compute
> exact anticipatibility we would need to traverse the bottom part of CFG, to
> get the numbers right.
>
>>
>> + if (!optimize_function_for_speed_p (cfun)
>>
>> please look at how I changed regular PRE to look at whether we want to
>> optimize the path which has the redundancy for speed via
>> optimize_edge_for_speed_p. The same surgerly should be applied to
>> PPRE.
>
> OK. In the updated patch we require at least one of the successors the
> partially anticipates the expression to be on the speed path.
>
> Any other comments?
+ppre_n_insert_for_speed_p (pre_expr expr, basic_block block,
+ unsigned int inserts_needed)
+{
+ /* The more expensive EXPR is, the more we should be prepared to insert
+ in the predecessors of BLOCK to make EXPR fully redundant.
+ For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use
+ SSA_NAME with a constant as cheap. */
the function implementation is odd. cost is always 1 when used, and
both memory loads and calls are always cheap, but for example casts are not?
Isn't
return EDGE_COUNT (block->preds) * cost >= inserts_needed;
always true? Or is inserts_needed not what it suggests?
+ /* Insert only if we can remove a later expression on a path
+ that we want to optimize for speed.
+ The phi node that we will be inserting in BLOCK is not free,
+ and inserting it for the sake of !optimize_for_speed successor
+ may cause regressions on the speed path. */
+ FOR_EACH_EDGE (succ, ei, block->succs)
+ {
+ if (bitmap_set_contains_value (PA_IN (succ->dest), val))
+ {
+ if (optimize_edge_for_speed_p (succ))
+ do_insertion = true;
+
+ ++pa_succs;
+ }
+ }
the change up to here looks good to me - can you factor out the command-line
switch plus this optimize_edge_for_speed_p test into a separate patch
(hereby approved)?
Thanks,
Richard.
> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>