https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209

--- Comment #9 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to ktkachov from comment #7)
> The other thing to consider with merging loads is how the result is used.
> In your example if you merge the 16-bit loads into a single 32-bit register
> load you'll have to add instructions to extract the low and high parts into
> separate registers in order to add them together and that can end up be more
> expensive overall.

It depends on whether or not you need those top bits.  For the example cited,
for aarch64, you could compile the example to (assuming no strict alignment):

subus:
    ldr   w0, [w0]
    add   w0, w0, w0, lsr #16
    uxth   w0, w0
    ret

in fact, since you don't care about any of the top bits (the function returns
an unsigned short) you can drop the uxth as well.

In this case, you don't even have to worry about big- vs little-endianness,
since addition is commutative.

Reply via email to