Issue 108840
Summary Don't hoist NOT's out of loop that will be folded into ANDN/ORN anyway (except sometimes on Arm)
Labels new issue
Assignees
Reporter Validark
    [Godbolt link](https://zig.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXABx8BBAKoBnTAAUAHpwAMvAFYTStJg1AAvPMFJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYgkuUgAHVAVCWwZnNw89GLibAV9/IJZQ8K5Iy0xrBKECJmICJPdPQswrDIYyioIswJCwiItyyuqUwp7Wv3bczoKASgtUV2Jkdg5MVRjKgGoqBjXUVAgmEBXXADYAFlIV4L3Do/H945WAUgB2ACE7jQBBFc%2BVgDcKlYV7gBmAAiKyY9wejjOd0BL3eXx%2Bf2ImAIewAAgAVACeUUwAHkqBAFNcYaCNDC4W9Xh8vgB3BB0TArCAEYiuTAk57UhEItAMBQEFaEMJA0EKCncnmfZGCx6OUlCghhCXwqX/e4AJgOCoAflBhcRNS8NQBWZkGoHywFQrjjTlQ/VKw0wq1QjUkrUrHVMO0qmlSvBUZkA0kKjTXYLIpgAaz9CMewOpkulKJmGxlfoTHEmtE4Jt4ng4WlIqE4AC0zP9prMmXcNYCeKRUUXs5NoyATRoAHQATg0Gg1GoeB1NkgeXBNkVzHCOvBYEn7pELxdLHF4ChAGibmmzpDgsBgiBQqBYUUZZAoEDQp/PIGMrNcDGjfDoTo3EHOLdIwT8FSxnEbH9mGILE8WCbRimbRtrzYQQ8QYWh/y/LBglcYBHDEWgN24XgsBYQxgHEZC8GREpvkwbDi0WYpXCVADeD8JVp2LWg8EjP9nCwHcm2IPB5xwyYqAMYAFAANTwTBaTxXFC0bfhBBEMR2C4A4ZEERQVHUL9dA1fQCLvMx9DYjdIEmVAokabCAFo8RWAAlepMCYJQADFnMFKyemAFEwQqZAEGOKzWIYVxVBWKyWGQKJXFJJgoiiegAH0WEBXhUHI4heKwEyIEmIoSjsCAHD6Twh1IHxhhyPIQAeaJYniAQSpAMq0gahg2iqzpavyxpml6Fwama7rHIKppBg6jpwm6wYmrKgUWgm0Ypry6s5gkHM8wLbjVxWUxgBWbsuCOLsNCsrBvi7Cc6yedAjmCZBjg0QFJGZXBCBITUG3GXhmy0O1SAQJysHCXLSHbI4exOnseweB56weI4uA1SQTX0ThZ1Iech17Y4JwOB4e0kLgCaR3TlzSzh103bcW0mfcj2vM96AvShGdve82SfF9aDfShP2LIC/3o79fxAsCIOsYWYMYAh4MQ7iULQjDaCw4W8IIoji3wUibHIyjeGo5BaPmRtGPqbjWPYkDOPmYtWT4%2BjBOEsSJKkmThfk4RRHEFS1PkJQ1G43Qp30vajOCHKzIshJrNshz6GczA3IFcKvJ81RJAORLAuC0Lwsi6LSWMBgMox9Kwiyij4DykbGnsBgnAG/pyobxbqsiVrGiajv6saNuxgsWvShmpvakHhph4WyrJr0eb%2BuSMe56GbIZ9tKYZjWtfp3zJdts4XbK0O47TvOy6TWu277se57XvwIhnXrW0fp3f7AaYYHKA2mc5xAQFu2HSQkgNAPDHAcHsGojgHFRuTEslMLDU1%2BruemEAkBs2ZuQVmJ4madA5o%2BZ8NAeZhHfPzXggsQLCzIaBcCkEpYnlgrLBCSEtaYFQuhTC2FGzqyMJrXCJFIJ4D1txQ2xthZm2YrwS2xAOIYFtj9Xi/EeBOyYCJcSklpKMA9rIRSPtVKew0oHbSv89JGAMuYS2kcSzRwELHeyjlE5CHwirVOFRvKCl4goZA3wc5%2BDzhFKKMUQRxBoORRKqhkYrgypXCxPUEj10bgvCQukKoryWhIE4ncEjd10hkzI09UlHXHvwgQfUqij1nkPYp408nt26C0butTKj93yCtDeykv47xgTtPaB1LrHzOpgC6V0NQ3Tug9I4T0XoQDevfT6T8aZ/S/hjeciMuxageE9E0gIIGSB7COMme81zwK3Ig/64NIYaGhrDeGiNkao2nKlXeX5VzP1pl/DUP8uCLk6XAk5kwMpxDsEcIAA)

This code:

```zig
export fn foo(a: u64, b: u64) u64 {
    var s = a | b;
    var ret: @TypeOf(s) = 0;

    while (true) {
        const iter = s;
        ret |= iter;
        s &= ~((iter +% (iter << 1)) | ((iter << 2) & ~a));
        if (s == 0) break;
    }

    return ret;
}
```

Results in this emit for Zen 4:

```asm
foo:
        or      rsi, rdi
        not rdi
        xor     eax, eax
.LBB0_1:
        lea     rdx, [4*rsi]
        lea     rcx, [rsi + 2*rsi]
        or      rax, rsi
 and     rdx, rdi; we could have just used `andn`
        or rdx, rcx
        andn    rsi, rdx, rsi
        jne     .LBB0_1
 ret
```

As you can see, we hoist `not rdi` out of the loop, even though we could have used `andn`. The same situation happens to the Sifive x280 (aggressive unrolling disabled via size-optimized build option):

```asm
foo:
        mv      a2, a0
        li a0, 0
        or      a1, a1, a2
        not     a2, a2
.LBB0_1:
 slli    a3, a1, 2
        sh1add  a4, a1, a1
        and     a3, a3, a2; could have used `andn`
        or      a0, a0, a1
        or a3, a3, a4
        andn    a1, a1, a3
        bnez    a1, .LBB0_1
 ret
```

However, on the Apple M3, it actually does make sense to hoist `mvn` out of the loop in this case, because we can do `and x11, x8, x9, lsl #2` but we can't do `bic x11, x8, x9, lsl #2` (I assume).

Apple M3 emit:

```asm
foo:
        mov     x8, x0
        mov     x0, #0
        orr     x9, x1, x8
        mvn x8, x8
.LBB0_1:
        orr     x0, x9, x0
        add     x10, x9, x9, lsl #1
        and     x11, x8, x9, lsl #2
        orr     x10, x11, x10
        bics    x9, x9, x10
        b.ne    .LBB0_1
 ret
```

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to