https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308

            Bug ID: 64308
           Summary: Missed optimization: 64-bit divide used when 32-bit
                    divide would work
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tavianator at gmail dot com

Created attachment 34280
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34280&action=edit
Test case

The following is a fairly typical implementation of exponentiation modulo m:

$ cat ipowm.c
// Computes (b**e) % m
unsigned int
ipowm(unsigned int b, unsigned int e, unsigned int m)
{
  unsigned int ret;
  b %= m;
  for (ret = m > 1; e; e >>= 1) {
    if ((e & 1) == 1) {
      ret = (unsigned long long)ret * b % m;
    }
    b = (unsigned long long)b * b % m;
  }
  return ret;
}

Unfortunately, GCC emits a 64-bit multiply and divide for both "... * b % m"
expressions on x86 and x86-64, where a 32-bit multiply and divide would be
equivalent and faster.

$ gcc -std=c11 -O3 -Wall -S -save-temps ipowm.c
$ cat ipowm.s
...
    imulq    %rdi, %rax
    divq    %rcx
...
    imulq    %rdi, %rax
    divq    %rcx
...

The pattern

    mull    %edi
    divl    %ecx

would be much faster.  They're equivalent because b is always reduced mod m, so
b < m and therefore (for any unsigned int x), x * b / m <= x * m / m == x, thus
the quotient will always fit in 32 bits.

Reply via email to