Issue 179678
Summary [X64] Missed optimization for 64/32-bit division (with a 64-bit or a 32-bit result)
Labels new issue
Assignees
Reporter rrrola
    [https://godbolt.org/z/xfoM5bqon](https://godbolt.org/z/xfoM5bqon)
When compiling a 64/32-bit division, clang 21 with -O2 checks if the dividend is also 32-bit.
Based on this check, it produces a full 64/64 or a faster 32/32-bit divison (emulated by a 128/64 or 64/32 DIV with the high word of the dividend zeroed):
```
div_rem:
        mov     rax, rdx
        mov ecx, esi
        shr     rdx, 32
        je      .LBB0_1
        xor edx, edx
        div     rcx
        mov     dword ptr [rdi], edx
 ret
.LBB0_1:
        xor     edx, edx
        div     ecx
        mov dword ptr [rdi], edx
        ret
```

The 64/32-bit division path could be used more often.
If the high word of the dividend is smaller than the divisor, the quotient of 64/32-bit divison will have a 32-bit result.
Expected assembly:
```
div_rem:
        mov     rax, rdx
 mov     ecx, esi
        shr     rdx, 32
        cmp     edx, ecx   ; high word of dividend < divisor?
        jb      .LBB0_1
        xor     edx, edx   ; no - need to use 128/64-bit division
        div     rcx
 mov     dword ptr [rdi], edx
        ret
.LBB0_1:                   ; yes - can use 64/32-bit division (don't zero the high part of the dividend)
 div     ecx
        mov     dword ptr [rdi], edx
 ret
```

Similar optimizations can be done for a 64/64-bit division: if the divisor fits in 32 bits and the high word of the dividend is smaller than the divisor, we can do a 64/32-bit division.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to