http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57052
Bug #: 57052 Summary: missed optimization with rotate and mask Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: amo...@gmail.com /* -m32 -O -S */ int foo (unsigned int x, int r) { return ((x << r) | (x >> (32 - r))) & 0xff; } results in: foo: rlwnm 3,3,4,0xffffffff rlwinm 3,3,0,24,31 blr Compiling the same code with -m32 -O -S -mlittle gives the properly optimized result of: foo: rlwnm 3,3,4,0xff blr This is because many of the rs6000.md rotate/shift and mask patterns use subregs with wrong byte offsets. eg. rotlsi3_internal7, the insn that ought to match here, has (subreg:QI (rotate:SI ...) 0). The 0 selects the most significant byte when BYTES_BIG_ENDIAN and the least significant when !BYTES_BIG_ENDIAN. Fortunately combine doesn't seem to generate subregs for high parts, so changing the testcase mask to 0xff000000 doesn't result in wrong code. Annoyingly, rotlsi3_internal4 would match here too if combine_simplify_rtx() didn't simplify (set (reg:SI) (and:SI () 255)) to use subregs.