https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95650
--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- AArch32 is able to produce the optimal sequence because the ABI specifies caller widening of parameters. For safety reasons AArch64 takes the opposite approach and requires the callee to narrow arguments. Sadly, because this isn't handled at the gimple level, it has to be detected during RTL optimization. Yes, the optimization is sound because the bits above bit 16 in the input values cannot affect the lower bits in the result of an addition. It's also likely that the expanders cannot see enough of what is going on to transform this efficiently either.