On 16/07/12 09:27, Jonathan Nieder wrote: > Michael Cree wrote: > >> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: > >>> gcc takes full advantage by converting the get_be32 >>> calls back to a load and bswap and producing a whole bunch of >>> unaligned access traps. >> >> Alpha does not have a bswap (or similar) instruction. > [...] >> . If the pointer is unaligned >> then two neighbouring aligned 32bit loads are required to ensure >> that all four bytes are loaded. If the pointer is aligned then a >> single 32bit load gets all the four bytes. Having never looked at >> the generated assembler code, I nevertheless suspect that is the >> guts of the optimisation --- the compiler can eliminate an access to >> memory if it knows the pointer is aligned. > > How about: > > gcc takes full advantage by using a single 32-bit load, > resulting in a whole bunch of unaligned access traps.
Yes, that's good. I just checked the generated assembler and I am correct except that I underestimated the number of memory accesses for the a priori known unaligned data case. It is actually doing four long-word memory access, i.e. a memory access for each individual byte despite that it only needs to do a maximum of two, so the optimisation for the a priori known aligned data saves three memory accesses, not just one. (And probably also saves on a load-load replay trap on the EV6 and later CPUs, i.e., a complete and very costly replay of the whole pipeline when the CPU realises just in the nick of time that it has cocked up the interrelationships between reordered load instructions.) Cheers Michael. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html