On Tuesday, 1 April 2014 at 23:05:55 UTC, dajones wrote:
x86 uses something called (IIRC) a "store forwarding buffer".
Essentialy it
keeps track of stores untill they have been completed. Any time
you read
from an address the store forwrding buffer is checked first,
then caches and
main memory. If it cant do that you have to wait for the store
to finalize,
and that can be a lot slower again. If there's no pending store
it comes
from the cache.
It is commonly called a store buffer? Most CPU have it these
days. Indeed, store are put in the store buffer until realized
(which can take some time as you have to acquire the cache line
from another core or memory).
When you load, the CPU snoop in the store buffer in parallel as
L1 cache for a value.