https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155

--- Comment #42 from prathamesh3492 at gcc dot gnu.org ---
Hi,
This is another simpler approach I tried to apply "cost-model" on hoisting
before approaching a more general solution:
http://people.linaro.org/~prathamesh.kulkarni/hoist-change-order.diff
In this prototype patch, I changed order of hoisting such that instead hoisting
an expression in first candidate block, it hoists expression one dominator at a
time.

For pr77445-2.c test-case, str_225 + 1 gets hoisted in block 10 because it's
the first candidate block found from the top-down dom-tree walk, which leaves
little room for controlling hoisting.
The patch forces expressions to be inserted in immediate dominator at a time
instead of the first candidate block. With this change, the following series of
hoistings take place for str_225 + 1:

Inserting expression in block 15 for code
hoisting:{pointer_plus_expr,str_225,1} (0079)
Inserting expression in block 14 for code hoisting:
{pointer_plus_expr,str_225,1} (0079)
Inserting expression in block 11 for code hoisting:
{pointer_plus_expr,str_225,1} (0079)
Inserting expression in block 10 for code hoisting:
{pointer_plus_expr,str_225,1} (0079)
Inserting expression in block 53 for code hoisting:
{pointer_plus_expr,str_225,1} (0079)

str_225 + 1 originally appears in blocks 16 and 17. It is then first hoisted
into their predecessor block 15, then into block 14 and so on. The advantage I
see with this order of hoisting is, we can control hoisting after each
insertion in it's immediate dominator. So for instance if according to our cost
model, we reach "hoisting threshold" after say block 14, we can then prevent
further hoistings of str_225 + 1. Whereas with the current approach it gets
hoisted right up to block 10 initially. Alternatively we could try to "sink"
the expression down to dominated blocks. I didn't explore this option yet.

* Cost model for hoisting
The cost model would be entirely target specific defined by a target hook and
shouldn't affect other architectures that don't wish to use it. I suppose a
very simple cost model for hoisting could take following two factors:
a) Number of hoistings of a particular expression measured in terms of
dominator depth - This is recorded by expr_hoist_map which is map<unsigned,
unsigned> the former representing value number of pre_expr and latter
represents the count.
b) Number of insertions in basic block - This is recorded by map<unsigned,
unsigned>, the former representing block index and latter represents the count.

I didn't attempt to define the cost-model in the patch. I was wondering what
could be other potential factors that we can consider ?

* Issues with changing hoisting order
I am not entirely sure if the result of changing hoisting order can result in
correctness issues or missed optimizations ? For some confidence, I validated
the patch with bootstrap+test on x86_64, which worked.

There are two problems I see:
(1) Interference with statistics of hoisting, which is easy to fix.
(2) Does not honor the "expression should be available in at least one
successor" constraint, which leads to more aggressive hoisting for
architectures that will not use cost model. In example above, str_225 + 1 got
hoisted one block further upto block-53, while with current-order it's
restricted to block-10. I suppose we could fix this by recording which
expressions were originally available at end of block ?

The patch passes bootstrap+test on x86_64.

* Hoistings crossing loop boundary - One "peculiarity" I see with FMS function
in pr77445-2.c is that all the hoistings cross loop boundaries at one point,
while other tests have significantly lesser.
I did a quick test with SPEC2006 to collect some data:
(number-of-hoistings vs number-of-functions)
{2: 89, 1: 166, 3: 37, 4: 14, 5: 8, 6: 10, 7: 11, 8: 2, 13: 3, 10: 1, 11: 5, 9:
4, 17: 1, 15: 2, 27: 1, 12: 2, 18: 1, 21: 1, 26: 1}

It seems most of the functions have cross loop hoistings less than 5 with 166
functions having one hoisting inside loop and 89 functions having two hoistings
across loops. I was wondering if a hoisting into a block from it's successor
should have "extra penalty" if it crosses a loop boundary ? Or does hoisting
inside a loop have no effect on register pressure ?

Thanks,
Prathamesh

Reply via email to