https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70490
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #5 from Peter Cordes <peter at cordes dot ca> --- It would be great if there was a CPUID feature-bit for (aligned?) 16B loads/stores being atomic. IDK if that's likely to ever happen, but it might be something we could ask CPU vendors for. I suspect that a lot of current CPUs do in practice have this feature, especially single-socket system. For example, Intel Haswell's internal data paths between different layers of cache are all at least 32B wide. I mention single-socket because narrower transfers in a coherency protocol can cause tearing even on CPUs with a 16B data path to/from L1d. e.g. experimental verification of non-atomicity on AMD Opteron 2435 (K10) with threads running on separate sockets, connected with HyperTransport: http://stackoverflow.com/questions/7646018/sse-instructions-which-cpus-can-do-atomic-16b-memory-operations/7647825#7647825 Still, CPUID could report something that was detected at power-on, if there are still cases where multi-socket coherency traffic only supports 8B atomicity. --- There's also a difference between atomicity guarantees for stuff like MMIO observed by non-CPU system devices, vs. only WB regions of normal DRAM observed by other CPUs. As I understand it, the current atomicity guarantees for aligned accesses up to 64b apply even for uncached accesses, except for the P6 unaligned guaranteed which specifically only applies to *cached* accesses. Relevant quotes extracted from the Intel's manual with commentary: http://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic