https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498
Bug ID: 82498 Summary: Missed optimization for x86 rotate instruction Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: lloyd at randombit dot net Target Milestone: --- GCC doesn't seem to realize that x86 masks the high bits in the rol/ror instructions. GCC 7.2.0 on x86-64 compiles this function, which is attempting to do a 32-bit rotate without invoking undefined behavior #include <stdint.h> uint32_t rotate_left(uint32_t input, uint8_t rot) { if(rot == 0) return input; rot %= 8 * sizeof(uint32_t); return static_cast<uint32_t>((input << rot) | (input >> (8*sizeof(uint32_t)-rot)));; } Into movl %esi, %ecx # rot, rot movl %edi, %eax # input, tmp97 andl $31, %ecx #, rot roll %cl, %eax # rot, tmp97 testb %sil, %sil # rot cmove %edi, %eax # tmp97,, input, <retval> The `andl` is unnecessary as the machine will mask the rotation amount for us. In addition the testb/cmov pair can be omitted. Overall this resulted in a ~15% slowdown in some code using many variable rotations (CAST-128 cipher being used in an OpenPGP library). Some related (but not quite the same, and supposedly fixed) issues: 57157 59100