| Issue |
108840
|
| Summary |
Don't hoist NOT's out of loop that will be folded into ANDN/ORN anyway (except sometimes on Arm)
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Validark
|
[Godbolt link](https://zig.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXABx8BBAKoBnTAAUAHpwAMvAFYTStJg1AAvPMFJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYgkuUgAHVAVCWwZnNw89GLibAV9/IJZQ8K5Iy0xrBKECJmICJPdPQswrDIYyioIswJCwiItyyuqUwp7Wv3bczoKASgtUV2Jkdg5MVRjKgGoqBjXUVAgmEBXXADYAFlIV4L3Do/H945WAUgB2ACE7jQBBFc%2BVgDcKlYV7gBmAAiKyY9wejjOd0BL3eXx%2Bf2ImAIewAAgAVACeUUwAHkqBAFNcYaCNDC4W9Xh8vgB3BB0TArCAEYiuTAk57UhEItAMBQEFaEMJA0EKCncnmfZGCx6OUlCghhCXwqX/e4AJgOCoAflBhcRNS8NQBWZkGoHywFQrjjTlQ/VKw0wq1QjUkrUrHVMO0qmlSvBUZkA0kKjTXYLIpgAaz9CMewOpkulKJmGxlfoTHEmtE4Jt4ng4WlIqE4AC0zP9prMmXcNYCeKRUUXs5NoyATRoAHQATg0Gg1GoeB1NkgeXBNkVzHCOvBYEn7pELxdLHF4ChAGibmmzpDgsBgiBQqBYUUZZAoEDQp/PIGMrNcDGjfDoTo3EHOLdIwT8FSxnEbH9mGILE8WCbRimbRtrzYQQ8QYWh/y/LBglcYBHDEWgN24XgsBYQxgHEZC8GREpvkwbDi0WYpXCVADeD8JVp2LWg8EjP9nCwHcm2IPB5xwyYqAMYAFAANTwTBaTxXFC0bfhBBEMR2C4A4ZEERQVHUL9dA1fQCLvMx9DYjdIEmVAokabCAFo8RWAAlepMCYJQADFnMFKyemAFEwQqZAEGOKzWIYVxVBWKyWGQKJXFJJgoiiegAH0WEBXhUHI4heKwEyIEmIoSjsCAHD6Twh1IHxhhyPIQAeaJYniAQSpAMq0gahg2iqzpavyxpml6Fwama7rHIKppBg6jpwm6wYmrKgUWgm0Ypry6s5gkHM8wLbjVxWUxgBWbsuCOLsNCsrBvi7Cc6yedAjmCZBjg0QFJGZXBCBITUG3GXhmy0O1SAQJysHCXLSHbI4exOnseweB56weI4uA1SQTX0ThZ1Iech17Y4JwOB4e0kLgCaR3TlzSzh103bcW0mfcj2vM96AvShGdve82SfF9aDfShP2LIC/3o79fxAsCIOsYWYMYAh4MQ7iULQjDaCw4W8IIoji3wUibHIyjeGo5BaPmRtGPqbjWPYkDOPmYtWT4%2BjBOEsSJKkmThfk4RRHEFS1PkJQ1G43Qp30vajOCHKzIshJrNshz6GczA3IFcKvJ81RJAORLAuC0Lwsi6LSWMBgMox9Kwiyij4DykbGnsBgnAG/pyobxbqsiVrGiajv6saNuxgsWvShmpvakHhph4WyrJr0eb%2BuSMe56GbIZ9tKYZjWtfp3zJdts4XbK0O47TvOy6TWu277se57XvwIhnXrW0fp3f7AaYYHKA2mc5xAQFu2HSQkgNAPDHAcHsGojgHFRuTEslMLDU1%2BruemEAkBs2ZuQVmJ4madA5o%2BZ8NAeZhHfPzXggsQLCzIaBcCkEpYnlgrLBCSEtaYFQuhTC2FGzqyMJrXCJFIJ4D1txQ2xthZm2YrwS2xAOIYFtj9Xi/EeBOyYCJcSklpKMA9rIRSPtVKew0oHbSv89JGAMuYS2kcSzRwELHeyjlE5CHwirVOFRvKCl4goZA3wc5%2BDzhFKKMUQRxBoORRKqhkYrgypXCxPUEj10bgvCQukKoryWhIE4ncEjd10hkzI09UlHXHvwgQfUqij1nkPYp408nt26C0butTKj93yCtDeykv47xgTtPaB1LrHzOpgC6V0NQ3Tug9I4T0XoQDevfT6T8aZ/S/hjeciMuxageE9E0gIIGSB7COMme81zwK3Ig/64NIYaGhrDeGiNkao2nKlXeX5VzP1pl/DUP8uCLk6XAk5kwMpxDsEcIAA)
This code:
```zig
export fn foo(a: u64, b: u64) u64 {
var s = a | b;
var ret: @TypeOf(s) = 0;
while (true) {
const iter = s;
ret |= iter;
s &= ~((iter +% (iter << 1)) | ((iter << 2) & ~a));
if (s == 0) break;
}
return ret;
}
```
Results in this emit for Zen 4:
```asm
foo:
or rsi, rdi
not rdi
xor eax, eax
.LBB0_1:
lea rdx, [4*rsi]
lea rcx, [rsi + 2*rsi]
or rax, rsi
and rdx, rdi; we could have just used `andn`
or rdx, rcx
andn rsi, rdx, rsi
jne .LBB0_1
ret
```
As you can see, we hoist `not rdi` out of the loop, even though we could have used `andn`. The same situation happens to the Sifive x280 (aggressive unrolling disabled via size-optimized build option):
```asm
foo:
mv a2, a0
li a0, 0
or a1, a1, a2
not a2, a2
.LBB0_1:
slli a3, a1, 2
sh1add a4, a1, a1
and a3, a3, a2; could have used `andn`
or a0, a0, a1
or a3, a3, a4
andn a1, a1, a3
bnez a1, .LBB0_1
ret
```
However, on the Apple M3, it actually does make sense to hoist `mvn` out of the loop in this case, because we can do `and x11, x8, x9, lsl #2` but we can't do `bic x11, x8, x9, lsl #2` (I assume).
Apple M3 emit:
```asm
foo:
mov x8, x0
mov x0, #0
orr x9, x1, x8
mvn x8, x8
.LBB0_1:
orr x0, x9, x0
add x10, x9, x9, lsl #1
and x11, x8, x9, lsl #2
orr x10, x11, x10
bics x9, x9, x10
b.ne .LBB0_1
ret
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs