On Tue, Jan 09, 2018 at 09:13:23PM -0800, Andrew Pinski wrote: > On Tue, Jan 9, 2018 at 6:54 AM, Segher Boessenkool > <seg...@kernel.crashing.org> wrote: > > On Tue, Jan 09, 2018 at 12:23:42PM +0000, Wilco Dijkstra wrote: > >> Segher Boessenkool wrote: > >> > On Mon, Jan 08, 2018 at 0:25:47PM +0000, Wilco Dijkstra wrote: > >> >> > Always pairing two registers together *also* degrades code quality. > >> >> > >> >> No, while it's not optimal, it means smaller code and fewer memory > >> >> accesses. > >> > > >> > It means you execute *more* memory accesses. Always. This may be > >> > sometimes hidden, sure. I'm not saying you do not want more ldp's; > >> > I'm saying this particular strategy is very far from ideal. > >> > >> No it means less since the number of memory accesses reduces (memory > >> bandwidth may increase but that's not an issue). > > > > The problem is *more* memory accesses are executed at runtime. Which is > > why separate shrink-wrapping does what it does: to have *fewer* executed. > > (It's not just the direct execution cost why that helps: more important > > are latencies to dependent ops, microarchitectural traps, etc.). > > On most micro-arch of AARCH64, having one LDP/STP will take just as > long as one LDR/STR as long as it is on the same cache line. > So having one LDP/STP compared to two LDR?STR is much better. LDP/STP > is considered one memory access really and that is where the confusion > is coming from. We are reducing the overall number of memory accesses > or keeping it the same on that path. > Hope this explanation allows you to understand why pairing does not > degrade the code quality but improves it overall.
Of course I see that ldp is useful. I don't think that this particular way of forcing more pairs is a good idea. Needs testing / benchmarking / instrumentation, and we haven't seen any of that. Forcing pairs before separate shrink-wrapping reduces the effectiveness of the latter by a lot. Segher