Issue |
150988
|
Summary |
AArch64 misuse of zero register for exclusive store status (maybe latent in 20.x)
|
Labels |
new issue
|
Assignees |
|
Reporter |
neldredge
|
I found a bug in LLVM 19.x and earlier where incorrect code is generated for certain atomic operations on AArch64. The incorrect code no longer manifests in 20.x, or at least I cannot find a test case; but this is due to an unrelated change, and I think the bug may still be latent.
Compile the following code with `clang -O -march=armv8-a -mno-outline-atomics foo.cc`:
```
void foo(volatile double *x) {
__atomic_fetch_add(x, 42.0, __ATOMIC_RELAXED);
}
```
[Try on godbolt](https://godbolt.org/z/s4TYf45fE)
With 19.x and earlier, this expands to a compare-swap loop, where the compare-swap is implemented with an LDXR/STXR pair. This is suboptimal but in principle would work correctly. However, the store side looks like this:
```
stlxr wzr, x8, [x0]
cbnz wzr, .LBB0_3
b .LBB0_1
```
The zero register `wzr` is used for the status operand of `stlxr`! As such, `stxlr` is effectively treated as if it always succeeds (and the `cbnz` branch to retry on failure is never taken). So if it does in fact fail, due to a concurrent store or an interrupt, this is ignored and the store never occurs at all.
If I understand the situation correctly, the whole sequence is initially treated as a [`CMP_SWAP_64` pseudo-instruction](https://github.com/llvm/llvm-project/blob/b2322772f2ab97de60db906a591353a5ef77cdfe/llvm/lib/Target/AArch64/AArch64InstrAtomics.td#L512), which has a "scratch" output corresponding to the status register. Since this output is not used in any surrounding code, the [aarch64-dead-defs](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp) pass identifies it as dead and assigns it `wzr`. But the pseudo-instruction is later [expanded](https://github.com/llvm/llvm-project/blob/9c82f87aec12c77444edf04d46af4d8405becc25/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp#L235) into multiple instructions, which do write and read that register.
----
In 20.x, the implementation of floating-point atomics [changed](https://github.com/llvm/llvm-project/commit/26c3a8404f1b3327a0982faeeaee94b08d1ee481), such that on ARMv8.0-A they are now implemented with a more efficient direct LL/SC loop, which does not suffer this bug. [Try on godbolt](https://godbolt.org/z/f7KqWcbsf). Indeed based on [`AArch64TargetLowering::shouldExpandAtomicRMWInIR`](https://github.com/llvm/llvm-project/blob/5f20518f5b4734d01848dfe44d24aed195dc2043/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L28380), it looks like a compare-exchange loop would only be used in the following cases:
- `-O0` where the dead register definition pass is presumably not performed (?)
- LSE (ARMv8.1-A and later), where the `cas` instruction is available and would be used instead
- When the operation may lower to a library call? If it expands to a call in the runtime library then presumably all is well. But I'm not sure if that always happens?
As such, I haven't been able to reproduce the issue in 20.x. But it doesn't seem like there have been any relevant changes to the dead register definition pass, nor to the `CMP_SWAP_64` pseudo-instruction, which makes me think this bug is still latent and could reappear if later changes invoke `CMP_SWAP_*` in other settings. I also wonder if any other pseudo-instructions could be affected.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs