------- Comment #4 from rth at gcc dot gnu dot org 2010-04-07 15:45 ------- My best guess is that this optimization should be done late. For instance, in the machine-dependant reorg pass. I don't see any place to hook this earlier.
The problem is that reload should be able to "spill" pseudos containing your got addresses and re-compute them from the given constants rather than consuming a stack slot to hold the computed value. Which means that the number of instances of got address loads may vary until after reload, which means that any size estimation calculation you do earlier can be off. The down-side to doing it after reload is that you will have committed to saving and restoring arm_pic_register in the prologue and epilogue. Given that arm uses ldm/stm this ought not impact your code size often, but will in the extreme case of a leaf function with no other saved registers. I guess you'll have to experiment with your implementation to see what gives the best results on a large body of code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129