[Bug libstdc++/54172] New: [4.7 Regression] __cxa_guard_acquire thread-safety issue

thiago at kde dot org Sat, 04 Aug 2012 04:37:57 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54172


             Bug #: 54172
           Summary: [4.7 Regression] __cxa_guard_acquire thread-safety
                    issue
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: libstdc++
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: thi...@kde.org


Created attachment 27936
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27936
Proposed fix.

In commit 184110, the __cxa_guard_acquire implementation in libsupc++/guard.cc
has been updated to use the new __atomic_* intrinsincs instead of the old
__sync_* ones. I believe this has introduced a regression due to a race
condition.

== Proof ==
While debugging a program, I set a hardware watchpoint on a guard variable and
set gdb to continue execution upon stop. The output was:

Hardware watchpoint 1:
_ZGVZN12_GLOBAL__N_121Q_QGS_textCodecsMutex13innerFunctionEvE6holder

Old value = 0
New value = 256
0x000000381205f101 in __cxxabiv1::__cxa_guard_acquire (g=0x7ffff7dc9a60)
    at ../../../../libstdc++-v3/libsupc++/guard.cc:254
254                 if (__atomic_compare_exchange_n(gi, &expected, pending_bit,
false,
Hardware watchpoint 1:
_ZGVZN12_GLOBAL__N_121Q_QGS_textCodecsMutex13innerFunctionEvE6holder

Old value = 256
New value = 1
__cxxabiv1::__cxa_guard_release (g=0x7ffff7dc9a60) at
../../../../libstdc++-v3/libsupc++/guard.cc:376
376             if ((old & waiting_bit) != 0)
[Switching to Thread 0x7fffebfff700 (LWP 113412)]
Hardware watchpoint 1:
_ZGVZN12_GLOBAL__N_121Q_QGS_textCodecsMutex13innerFunctionEvE6holder

Old value = 1
New value = 256
0x000000381205f101 in __cxxabiv1::__cxa_guard_acquire (g=0x7ffff7dc9a60)
    at ../../../../libstdc++-v3/libsupc++/guard.cc:254
254                 if (__atomic_compare_exchange_n(gi, &expected, pending_bit,
false,
[New Thread 0x7ffff0a2d700 (LWP 113413)]
Hardware watchpoint 1:
_ZGVZN12_GLOBAL__N_121Q_QGS_textCodecsMutex13innerFunctionEvE6holder

Old value = 256
New value = 1
__cxxabiv1::__cxa_guard_release (g=0x7ffff7dc9a60) at
../../../../libstdc++-v3/libsupc++/guard.cc:376
376             if ((old & waiting_bit) != 0)

As can be seen by the output, the guard variable transitioned from 0 -> 256 ->
1 -> 256 -> 1.

== Analysis ==

The code in guard.cc is:

        int expected(0);
        const int guard_bit = _GLIBCXX_GUARD_BIT;
        const int pending_bit = _GLIBCXX_GUARD_PENDING_BIT;
        const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;

        while (1)
          {
            if (__atomic_compare_exchange_n(gi, &expected, pending_bit, false,
                                            __ATOMIC_ACQ_REL,
                                            __ATOMIC_RELAXED))
              {
                // This thread should do the initialization.
                return 1;
              }

            if (expected == guard_bit)
              {
                // Already initialized.
                return 0;       
              }
             if (expected == pending_bit)
               {
                 int newv = expected | waiting_bit;
                 if (!__atomic_compare_exchange_n(gi, &expected, newv, false,
                                                  __ATOMIC_ACQ_REL, 
                                                  __ATOMIC_RELAXED))
                   continue;

                 expected = newv;
               }

            syscall (SYS_futex, gi, _GLIBCXX_FUTEX_WAIT, expected, 0);
          }

We have two threads running and they both reach __cxa_guard_acquire more or
less at the same time. On one thread, the execution is the expected path: the
first CAS succeeds and that transitions the guard variable from 0 to 256. That
thread will initialise the static.

In the second thread, the CAS fails, so it will proceed to the second CAS,
trying to replace 256 with 768 (to indicate it's going to sleep).

In the mean time, the first thread calls __cxa_guard_release, which exchanges
the 256 with a 1.

Therefore, on the second thread, the second CAS fails and now expected == 1 (it
got updated). The continue makes it return to the first CAS with expected == 1
and that one succeeds, by replacing it from 1 to 256, which is wrong.

== Solution ==

This issue appears to be caused by the new atomic intrinsics updating the
expected variable and the looping. If the second CAS fails, the code needs to
inspect the value set there to determine what to do next. The possible values
are:

 0: the other thread aborted, we should try again -> continue
 1: initialisation completed, we should return 0
 (256: can't happen)
 768: yet another thread succeeded in setting the waiting bit, we should sleep

The attached patch is a proposed solution to the problem, but I have not been
able to test it yet.

[Bug libstdc++/54172] New: [4.7 Regression] __cxa_guard_acquire thread-safety issue

Reply via email to