On 14/11/2019 06.08, Timur Tabi wrote: > On 11/12/19 1:14 AM, Rasmus Villemoes wrote: >> but that's because readl and writel by definition work on little-endian >> registers. I.e., on a BE platform, the readl and writel implementation >> must themselves contain a swab, so the above would end up doing two >> swabs on a BE platform. > > Do you know whether the compiler optimizes-out the double swab? >
Depends. It's almost impossible to figure out how swab32() is defined, so how much visibility gcc has into how it works is hard to say. But a further complication is that the arch may not have, say (simplifying somewhat) #define readl(x) swab32(*(volatile u32*)x) but instead have readl implemented as inline asm which includes the byteswap. PPC being a case in point, where the readl is in_le32 which is done with a lwbrx instruction, and certainly gcc couldn't in any way change a swab32(asm("lwbrx")) into asm("lwz"). But ppc defines its own mmio_read32be, so that's not an issue. >> (On PPC, there's a separate definition of mmio_read32be, namely >> writel_be, which in turn does a out_be32, so on PPC that doesn't >> actually end up doing two swabs). >> >> So ioread32be etc. have well-defined semantics: access a big-endian >> register and return the result in native endianness. > > It seems weird that there aren't any cross-arch lightweight > endian-specific I/O accessors. Agreed, but I'm really not prepared for trying to go down that rabbit hole again. Rasmus