@mikra I can't say yet if my code will work, as I have to get Nim to use vcc first (found [this thread](https://forum.nim-lang.org/t/2770)), but my approach is somewhat different. Firstly, I tried to always use the "right size" call, by delegating to the appropriate Windows method using "when sizeof(T) == 8: ..." style code. Secondly, I also used "exchange" to replace "store" like you; I could not find anything better, but I have seen on stack-overflow people saying you should just set it non-atomically, and call a fence afterward. Maybe it works, but I didn't like that solution. Thirdly, I think "load" is better replaced by using "_InterlockedOr"; (x | 0) makes more sense to me than (x & F...).
What I still haven't understood yet, is why there seems to exist both "_InterlockedOr64_acq" and "InterlockedOr64Acquire" (for example), doing the same thing.
