https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840
Bug ID: 108840
Summary: Aarch64 doesn't optimize away shift counter masking
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jakub at gcc dot gnu.org
Target Milestone: ---
As mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612214.html
aarch64 doesn't optimize away and instructions masking shift count if there is
more than one shift with the same count. Consider -O2 -fno-tree-vectorize:
int
foo (int x, int y)
{
return x << (y & 31);
}
void
bar (int x[3], int y)
{
x[0] <<= (y & 31);
x[1] <<= (y & 31);
x[2] <<= (y & 31);
}
void
baz (int x[3], int y)
{
y &= 31;
x[0] <<= y;
x[1] <<= y;
x[2] <<= y;
}
void corge (int, int, int);
void
qux (int x, int y, int z, int n)
{
n &= 31;
corge (x << n, y << n, z >> n);
}
foo is optimized correctly, combine matches the shift with masking, but in the
rest of cases due to costs the desirable combination is rejected. Shift with
embedded masking of the count should have rtx_cost the same as normal shift
when it is actually under the hood the shift itself.