https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86019
Bug ID: 86019 Summary: Unref implementation using atomic_thread_fence generates worse code on x86-64 in gcc 8.1 than 7.3 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: klempner at imsanet dot org Target Milestone: --- See https://godbolt.org/g/Tu80RI (specifically the godegen for do_unref3()) Simplified version: https://godbolt.org/g/Xbn6n6 This is the unref half of a refcount implementation: #include <atomic> std::atomic<int> refcount; bool do_unref() { int old_count = refcount.fetch_sub(1, std::memory_order_release); if (old_count == 1) { std::atomic_thread_fence(std::memory_order_acquire); } return old_count == 1; } In particular, unref needs release semantics on every decrement, but only needs acquire semantics on the last decrement. std::atomic_thread_fence(std::memory_order_acquire) should be (approximately) a no-op on x86. gcc 7.3 generated the right code here: do_unref(): lock sub DWORD PTR refcount[rip], 1 sete al ret gcc 8.1 generates a branch choosing between duplicate codepaths based on whether old_count == 1: do_unref(): mov eax, -1 lock xadd DWORD PTR refcount[rip], eax cmp eax, 1 je .L4 cmp eax, 1 sete al ret .L4: cmp eax, 1 sete al ret It also appears to fail to optimize based on decrementing the constant value 1.