https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70119
Bug ID: 70119 Summary: AArch64 should take advantage of implicit truncation of variable shift amount without defining SHIFT_COUNT_TRUNCATED Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 Consider the testcases: unsigned f1(unsigned x, int y) { return x << (y & 31); } unsigned long f2(unsigned long x, int y) { return x << (y & 63); } unsigned long f3 (unsigned long bit_addr) { unsigned long bitnumb = bit_addr & 63; return (1L << bitnumb); } Currently we generate for -O2: f1: and w1, w1, 31 lsl w0, w0, w1 ret f2: and w1, w1, 63 lsl x0, x0, x1 ret f3: and x0, x0, 63 mov x1, 1 lsl x0, x1, x0 ret The masking of the shift amount could be omitted because the lsl (and other shuft/rotate instructions) implicitly truncate their shift amount to the register width. GCC could figure that out if we defined SHIFT_COUNT_TRUNCATED, but we can't do that for TARGET_SIMD because the variable shift patterns have alternatives that perform the shifts on the vector registers. Those instructions don't truncate their shift amount. A simple solution is to write a pattern for combine to catch the shift/rotate by an and-immediate and emit the simple ALU shift/rotate instruction: (set (reg:SI 1) (ashift:SI (reg:SI 2) (and:QI (reg:QI 3) (const_int 31)))) The AND operation is in QImode because the shift expanders expand the shift amount to a QImode value. This doesn't quite work, however. During combine the midend creates a subreg of the whole AND expression for the shift amount: (subreg:QI (and:SI (reg:SI x1) (const_int 31)) 0) instead of propagating the subreg inside the AND. Some discussion at: https://gcc.gnu.org/ml/gcc/2016-02/msg00357.html (thread continues into 2016-03) One solution could be to teach simplify-rtx to move the subreg inside. Another proposed solution is to teach the backend to match different modes for the shift amounts. However, I haven't had much luck implementing that idea. The "ashl" standard name must expand to a single mode for the shift amount and any explicit masking operation (like in the testcases) that is propagated into the shift amount must be forced to that mode. For QImode and SImode shift amounts I'm seeing the same issue as above (subreg of an AND-immediate) and for DImode shift amounts I see zero_extends of SImode rtxes being created for the shift amount, that also don't match.