https://llvm.org/bugs/show_bug.cgi?id=24191
Bug ID: 24191
Summary: Possibly inefficient std::atomic<int> codegen on x86
for simple arithmetic
Product: clang
Version: 3.7
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: LLVM Codegen
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
[I also reported this issue to GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66881]
Consider these two simple versions of addition:
#include <atomic>
std::atomic<int> x;
int y;
void f(int a) {
x.store(x.load(std::memory_order_relaxed) + a, std::memory_order_relaxed);
}
void g(int a) {
y += a;
}
Clang generates the following assembly (https://goo.gl/IWtwkr):
f(int): # @f(int)
mov eax, dword ptr [rip + x]
add eax, edi
mov dword ptr [rip + x], eax
ret
g(int): # @g(int)
add dword ptr [rip + y], edi
ret
Now, it is clear to me that the correct atomic codegen for store() and load()
is "mov", as it appears here, but why aren't the two consecutive operations not
folded into a single add? Aren't the semantics and the memory ordering the
same? x86 says that (most) "reads" and "writes" are strongly ordered; doesn't
that apply to the read and write produced by "add", too?
(My original motivation came from a variant of this with floats, where the
non-atomic code executed noticeably faster, even though I would have expected
the two to produce the same machine code.)
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs