https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
--- Comment #42 from prathamesh3492 at gcc dot gnu.org --- Hi, This is another simpler approach I tried to apply "cost-model" on hoisting before approaching a more general solution: http://people.linaro.org/~prathamesh.kulkarni/hoist-change-order.diff In this prototype patch, I changed order of hoisting such that instead hoisting an expression in first candidate block, it hoists expression one dominator at a time. For pr77445-2.c test-case, str_225 + 1 gets hoisted in block 10 because it's the first candidate block found from the top-down dom-tree walk, which leaves little room for controlling hoisting. The patch forces expressions to be inserted in immediate dominator at a time instead of the first candidate block. With this change, the following series of hoistings take place for str_225 + 1: Inserting expression in block 15 for code hoisting:{pointer_plus_expr,str_225,1} (0079) Inserting expression in block 14 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 11 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 10 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 53 for code hoisting: {pointer_plus_expr,str_225,1} (0079) str_225 + 1 originally appears in blocks 16 and 17. It is then first hoisted into their predecessor block 15, then into block 14 and so on. The advantage I see with this order of hoisting is, we can control hoisting after each insertion in it's immediate dominator. So for instance if according to our cost model, we reach "hoisting threshold" after say block 14, we can then prevent further hoistings of str_225 + 1. Whereas with the current approach it gets hoisted right up to block 10 initially. Alternatively we could try to "sink" the expression down to dominated blocks. I didn't explore this option yet. * Cost model for hoisting The cost model would be entirely target specific defined by a target hook and shouldn't affect other architectures that don't wish to use it. I suppose a very simple cost model for hoisting could take following two factors: a) Number of hoistings of a particular expression measured in terms of dominator depth - This is recorded by expr_hoist_map which is map<unsigned, unsigned> the former representing value number of pre_expr and latter represents the count. b) Number of insertions in basic block - This is recorded by map<unsigned, unsigned>, the former representing block index and latter represents the count. I didn't attempt to define the cost-model in the patch. I was wondering what could be other potential factors that we can consider ? * Issues with changing hoisting order I am not entirely sure if the result of changing hoisting order can result in correctness issues or missed optimizations ? For some confidence, I validated the patch with bootstrap+test on x86_64, which worked. There are two problems I see: (1) Interference with statistics of hoisting, which is easy to fix. (2) Does not honor the "expression should be available in at least one successor" constraint, which leads to more aggressive hoisting for architectures that will not use cost model. In example above, str_225 + 1 got hoisted one block further upto block-53, while with current-order it's restricted to block-10. I suppose we could fix this by recording which expressions were originally available at end of block ? The patch passes bootstrap+test on x86_64. * Hoistings crossing loop boundary - One "peculiarity" I see with FMS function in pr77445-2.c is that all the hoistings cross loop boundaries at one point, while other tests have significantly lesser. I did a quick test with SPEC2006 to collect some data: (number-of-hoistings vs number-of-functions) {2: 89, 1: 166, 3: 37, 4: 14, 5: 8, 6: 10, 7: 11, 8: 2, 13: 3, 10: 1, 11: 5, 9: 4, 17: 1, 15: 2, 27: 1, 12: 2, 18: 1, 21: 1, 26: 1} It seems most of the functions have cross loop hoistings less than 5 with 166 functions having one hoisting inside loop and 89 functions having two hoistings across loops. I was wondering if a hoisting into a block from it's successor should have "extra penalty" if it crosses a loop boundary ? Or does hoisting inside a loop have no effect on register pressure ? Thanks, Prathamesh