Thank you for the comments. > -----Original Message----- > From: Eric Botcazou [mailto:[email protected]] > Sent: Wednesday, September 05, 2012 9:39 PM > To: Zhenqiang Chen > Cc: [email protected] > Subject: Re: [PATCH] Enable bbro for -Os > > > Basic block reordering is disabled for -Os from gcc 4.7 since the pass > > will lead to big code size regression. But benchmarks logs also show > > there are lots of regression due to poor code layout compared with 4.6. > > > > The patch is to enable bbro for -Os. When optimizing for size, it > > * avoid duplicating block. > > * keep its original order if there is no chance to fall through. > > * ignore edge frequency and probability. > > * handle predecessor first if its index is smaller to break long trace. > > * only connect Trace n with Trace n + 1 to reduce long jump. > > > > Here are the CSiBE code size benchmark results: > > * For ARM, code size reduces 0.21%. > > * For MIPS, code size reduces 0.25%. > > * For PPC, code size reduces 0.33%. > > * For X86, code size reduces 0.22%. > > Interesting figures. The patch looks good overall but, since the -Os path > deviates substantially from the original algorithm, it needs to be clearly > documented in the comment at the top of the file (before "Reference"), e.g. > > "The above description is for the full algorithm, which is used when the > function is optimized for speed. When the function is optimized for size, in > order to <...insert reasons here...>, the algorithm is modified as follows: > <...list modifications here...>"
Add the following comments:
+ The above description is for the full algorithm, which is used when the
+ function is optimized for speed. When the function is optimized for
size,
+ in order to reduce long jump and connect more fall through edges, the
+ algorithm is modified as follows:
+ (1) Break long trace to short ones. The trace is broken at a block,
which
+ has multi-predecessors/successors during finding traces. When
connecting
+ traces, only connect Trace n with Trace n + 1. This change reduces most
+ long jumps compared with the above algorithm.
+ (2) Ignore the edge probability and frequency for fall through edges.
+ (3) Keep its original order when there is no chance to fall through.
bbro
+ bases on the result of cfg_cleanup, which does lots of optimizations on
cfg.
+ So the order is expected to be kept if no fall through.
+
+ To implement the change for code size optimization, block's index is
+ selected as the key and all traces are found in one round.
> @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th,
> gcov_type count_th,
> /* Add all non-selected successors to the heaps. */
> FOR_EACH_EDGE (e, ei, bb->succs)
> {
> + /* Wait for the predecessors. */
> + if ((e == best_edge) && for_size
> + && (EDGE_COUNT (best_edge->dest->succs) > 1
> + || EDGE_COUNT (best_edge->dest->preds) > 1))
> + {
> + best_edge = NULL;
> + }
> +
> if (e == best_edge
> || e->dest == EXIT_BLOCK_PTR
> || bb_visited_trace (e->dest))
>
> I don't really understand what this means and why this is done here. If
you
> don't want to add the best destination in some cases, why not do it just
> before the loop and explicitly state the reason? And you don't need
> parentheses.
The code segment should be before the loop. Add the following comments for
the code:
+ /* If the best destination has multiple successors or
predecessors,
+ don't allow it to be added when optimizing for size. This
makes
+ sure predecessors with smaller index handled before the best
+ destination. It breaks long trace and reduces long jumps.
+
+ Take if-then-else as an example.
+ A
+ / \
+ B C
+ \ /
+ D
+ If we do not remove the best edge B->D/C->D. The final order
is
+ A B D ... C. C is at the end of the program. If D or
successors
+ of D are complicated, might need long jump for A->C and C->D.
+ Similar issue for order: A C D ... B.
+
+ After removing the best edge, the final result will be
ABCD/ACBD.
+ It does not add jump compared with the previous order. But it
+ reduce the possibility of long jump. */
+ if (best_edge && for_size
+ && (EDGE_COUNT (best_edge->dest->succs) > 1
+ || EDGE_COUNT (best_edge->dest->preds) > 1))
+ best_edge = NULL;
All other comments are accepted.
The updated patch is attached. Is it OK?
Thanks!
-Zhenqiang
Enable-bbro-for-size-updated2.patch
Description: Binary data
