https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Tried: unsigned int a; int main () { for (int i = 0; i < 100000000; i++) __atomic_store_n (&a, i, __ATOMIC_SEQ_CST); return 0; } and: unsigned int a; int main () { for (int i = 0; i < 100000000; i++) __atomic_exchange_n (&a, i, __ATOMIC_SEQ_CST); return 0; } and got (microbenchmark, sure): Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz user 0m1.045s user 0m0.441s Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz user 0m1.216s user 0m0.529s Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz user 0m1.185s user 0m0.627s Intel Xeon E312xx (Sandy Bridge) user 0m1.600s user 0m0.846s Quad-Core AMD Opteron(tm) Processor 8354 user 0m1.720s user 0m0.724s