https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308
Bug ID: 64308 Summary: Missed optimization: 64-bit divide used when 32-bit divide would work Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tavianator at gmail dot com Created attachment 34280 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34280&action=edit Test case The following is a fairly typical implementation of exponentiation modulo m: $ cat ipowm.c // Computes (b**e) % m unsigned int ipowm(unsigned int b, unsigned int e, unsigned int m) { unsigned int ret; b %= m; for (ret = m > 1; e; e >>= 1) { if ((e & 1) == 1) { ret = (unsigned long long)ret * b % m; } b = (unsigned long long)b * b % m; } return ret; } Unfortunately, GCC emits a 64-bit multiply and divide for both "... * b % m" expressions on x86 and x86-64, where a 32-bit multiply and divide would be equivalent and faster. $ gcc -std=c11 -O3 -Wall -S -save-temps ipowm.c $ cat ipowm.s ... imulq %rdi, %rax divq %rcx ... imulq %rdi, %rax divq %rcx ... The pattern mull %edi divl %ecx would be much faster. They're equivalent because b is always reduced mod m, so b < m and therefore (for any unsigned int x), x * b / m <= x * m / m == x, thus the quotient will always fit in 32 bits.