Re: RFD: Integrate shorten_branches, machine-dependent constant pool placement and small-scale hot/cold partitioning

Richard Earnshaw Tue, 16 May 2006 03:14:49 -0700

On Sun, 2006-05-14 at 17:09, Joern RENNECKE wrote:
> The constant pool placement that sh_reorg does has somewhat hapazard
>  results. It does not take execution frequencies into account, so if
>  you are unlucky, you can end up with a constant table wedged into some
>  hoit spot of the code, which not only adds an extra jump into the
>  critical path, but can also cause conditional branches to go out of
>  range, which are then converted into branches around jumps or branches
>  to jumps. A related problem is that branch splitting does not take
>  into account the branch probabilities, and the placement of jumps -
>  and if necessary, constants for these jumps - is also rather ad-hoc.
>  Moreover, when you think about optimal placement of individual jumps,
>  it is only a little step further to think about small basic blocks
>  ending with a jump which are branched over by a likely taken branch. 
>  Or if the branched-around code is very seldom executed, we can also
>  consider it even if it falls through at the end to merge with the
>  destination of the branch. On architectures where a not-taken branch
>  is cheaper than a taken branch, it makes sense to move this code out
>  of the way - not far away like the large-scale hot/cold partitioning
>  does, but rather within the range of conditional branches. For the SH,
>  this means within -252..+258 bytes.  That fits nicely with the range
>  of of 16-bit constants, which are in a range of 4..514 bytes. 32-bit
>  constants can be a bit farther, in the range 4..1024 bytes.
> 
> In order to do the constant and jump / cold code placement sensibly, I 
> need to be able to determine which branches go out of range beause of a
>  particular. With the current separation of branch shortening and
>  constant table placement, there are no useful estimates - before
>  constant placements, we estimate that no single conditional branch is
>  in range.  The problem here is that we have to assume that a maximum
>  sized constant pool of 1020 bytes might be inserted anywhere, even
>  though in practice most constant pools are much smaller. The length of
>  the branches themselves is estimated at 6 when it should be 2.
> 
> I'e realized now that I can take the code of shorten_branches, add two 
> lookup arrays and a bit of code (which won't change the time complexity
>  of shorten_branches), I can calculate much better estimates for
>  instruction sizes for indeterminate constant pool placement - simply
>  by keeping track of the maximum amount of constants that could be
>  inserted into any one place.  The informationj calculated by such a
>  modifed shorten_branches pass would also allow to determine when a
>  short branch is possible in the absence of a cosntant pool table, and
> at what table size inserted it will go out of range.
> 
> I wonder now if I should keep this as SH-specific code, or does it make
>  sense to write this a bit more generic - i.e. a variable number of
>  constant ranges, configurable size of small cold blocks, and the range
>  of branches selectable - and provide this as a new piece of gcc
>  infrastructure.
> 
> Do other port maintainers see similar issues with their ports?


Yes, the problem on Thumb-1 is the same in almost all respects:
1) Conditional branches have limited range
2) Pools (sometimes) have to be inserted in the middle of the the
instruction flow (if we can't find a suitable branch to hide the pool
behind).
3) Pool placement and branch shortening both affect each other, making
it hard to find the global optimum.

With the Thumb code there is an additional problem that I'd like to try
and solve:
- Small functions all get their own constant pool, even if they could
reach further to allow pool merging.

Given the distinct similarities here, I think it is worth investigating
whether some common solution can be found.  It's silly for both backends
to be maintaining code that does substantially the same thing.

R.

Re: RFD: Integrate shorten_branches, machine-dependent constant pool placement and small-scale hot/cold partitioning

Reply via email to