Richard Biener wrote: On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
> > If bswap is false no byte swap is needed, so we found a native endian load > > and it will always perform the optimization by inserting an unaligned load. > > Yes, the general agreement is that the expander can do best and thus we > should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS. > The expander will generate the canonical best code (hopefully...). Right, but there are cases where you have to choose between unaligned or aligned accesses and you need to know whether the unaligned access is fast. A good example is memcpy expansion, if you have fast unaligned accesses then you should use them to deal with the last few bytes, but if they get expanded, using several aligned accesses is much faster than a single unaligned access. > > This apparently works on all targets, and doesn't cause alignment traps or > > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS. > > So I'm at a loss what these macros are supposed to mean and how I can query > > whether a backend supports fast unaligned access for a particular mode. > > > > What I actually want to write is something like: > > > > if (!FAST_UNALIGNED_LOAD (mode, align)) return false; > > > > And know that it only accepts unaligned accesses that are efficient on the > > target. > > Maybe we need a new hook like this and get rid of the old one? > > No, we don't need to other hook. > > Note there is another similar user in gimple-fold.c when folding small > memcpy/memmove > to single load/store pairs (patch posted but not applied by me -- I've > asked for strict-align > target maintainer feedback but got none). I didn't find it, do you have a link? > Now - for bswap I'm only 99% sure that unaligned load + bswap is > better than piecewise loads plus manual swap. It depends on whether unaligned loads and bswap are expanded or not. Even if we assume the expansion is at least as efficient as doing it explicitly (definitely true for modes larger than the native integer size - as we found out in PR77308!), if both the unaligned load and bswap are expanded it seems better not to make the transformation for modes up to the word size. But there is no way to find out as SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true. > But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS / > STRICT_ALIGNMENT checks from the GIMPLE side of the compiler. I sort of agree because the purpose of these macros is unclear - the documentation is insufficient and out of date. I do believe however we need an accurate way to find out whether a target supports fast unaligned accesses as that is required to generate good target code. Wilco