https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #16 from Thomas Rodgers <rodgertq at gcc dot gnu.org> --- (In reply to Thiago Macieira from comment #15) > > > 5) std::barrier implementation also uses a type that futex(2) can't > > > handle > > > barrier still uses a 1-byte enum for the atomic waits. > > That can only now be fixed for libstdc++.so.7, then. The original implementation came from Olvier Giroux and is part of libc++. The libc++ implementation also does not use a type that futex or ulock_wait/wake (uint64_t) can handle. I have discussed this in the past with Olivier, the choice of char was deliberate on his part. The implementation has been tested on a number of platforms (including time on ORNL's Summit). The following comment, preserved from libc++ should be considered carefully before any change here - " 2. A great deal of attention has been paid to avoid cache line thrashing by flattening the tree structure into cache-line sized arrays, that are indexed in an efficient way." It is my opinion that the bar for making a change here is high. I would need to see benchmark numbers that illustrate the performance differences under various contention scenarios vs impact on caches by being able to fit the entire tree in a single cache line using char, vs four or eight cache lines using the type favored by futex or ulock_wait/wake.