I started this thread surprised by the little overhead but it turned out to be true only for most modern CPUs. An overhead of 10% for some/most CPUs is quite high.
However! What a modern CPU does today is what most other CPUs do tomorrow so `atomicArc` seems to have a bright future. On the other hand the RISC V people will ruin everything with their "worse-is-better you don't need integer arithmetic checking or sane addressing modes" mentality. (Sorry to digress but I really don't like RISC V.) But maybe currently arc vs atomicArc is the wrong question. Just turning on `--threads` seems to be a performance killer. I used to blame MingW's thread local storage implementation but plenty of OS/CPU combinations seem to be affected? What is going on? Access to thread local storage should not be this slow, it's supposed to be really fast...