https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59305
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |libgcc --- Comment #29 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #28) > Me too on a ThunderX, I thought it was due to an hardware errata too (where > load acquire was not a memory barrier after a store release). The problem turns out that pthread_mutex_lock/unlock is not fair. So what is happening is the newly created thread (which does the stores) will happen to get the lock more often than the other thread which is doing the arithmetic operations and is the time thread which is keeping count. There are a few ways of fixing this. One is to loop on try lock for a few thousand times before falling through to the full mutex_lock [Really this should be done this way in libc]. The other way is to use spin locks (which does not fix darwin as darwin does not have pthread spinlocks).