[Bug c++/86019] New: Unref implementation using atomic_thread_fence generates worse code on x86-64 in gcc 8.1 than 7.3

klempner at imsanet dot org Thu, 31 May 2018 11:26:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86019


            Bug ID: 86019
           Summary: Unref implementation using atomic_thread_fence
                    generates worse code on x86-64 in gcc 8.1 than 7.3
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: klempner at imsanet dot org
  Target Milestone: ---

See https://godbolt.org/g/Tu80RI (specifically the godegen for do_unref3())

Simplified version: https://godbolt.org/g/Xbn6n6

This is the unref half of a refcount implementation:

#include <atomic>

std::atomic<int> refcount;

bool do_unref() {
    int old_count = refcount.fetch_sub(1, std::memory_order_release);
    if (old_count == 1) {
        std::atomic_thread_fence(std::memory_order_acquire);
    }
    return old_count == 1;
}

In particular, unref needs release semantics on every decrement, but only needs
acquire semantics on the last decrement.

std::atomic_thread_fence(std::memory_order_acquire) should be (approximately) a
no-op on x86.

gcc 7.3 generated the right code here:

do_unref():
        lock sub        DWORD PTR refcount[rip], 1
        sete    al
        ret

gcc 8.1 generates a branch choosing between duplicate codepaths based on
whether old_count == 1:

do_unref():
        mov     eax, -1
        lock xadd       DWORD PTR refcount[rip], eax
        cmp     eax, 1
        je      .L4
        cmp     eax, 1
        sete    al
        ret
.L4:
        cmp     eax, 1
        sete    al
        ret

It also appears to fail to optimize based on decrementing the constant value 1.

[Bug c++/86019] New: Unref implementation using atomic_thread_fence generates worse code on x86-64 in gcc 8.1 than 7.3

Reply via email to