On 13 Oct 2010, at 00:51, Andrew Brunner wrote:

The interesting thing I have noticed was that Arrays[n] of boolean can
be used without memory barriers.  There is not one lock associated
with the boolean arrays and it always proves non-problematic on a 6
core system with 4gig ram.  There are boolean value checks that I did
inside the loops to see if any values were assigned out-of-order and
over the hours of tests I ran across up to 1200 threads... not one
false positive!

See also http://en.wikipedia.org/wiki/Memory_ordering#cite_note- table-2 for an overview of what kind of memory reordering is performed by different architectures . It shows that x86 CPUs only perform one kind of memory reordering (except if it supports and is explicitly put into oostore mode). The reordering it supports by default can execute stores that come before a load in the program code, after that load instead. This means that if you use a regular variable (such as a boolean) for synchronisation

1) on entry of the "critical section" protected by this variable, you can have problems, because this sequence:

locked:=true;
local:=shared_global_var;

may actually be executed in this order:

local:=shared_global_var;
locked:=true;

So you can get speculative reads into the "critical section"

2) when exiting the "critical section", there are no problems, because none of the loads or stores before the one that sets the boolean "lock" variable to false, can be moved past that store.


In summary, the fact that a particular program runs fine on your particular machine does not mean anything: a) your particular machine may not perform any kind of reordering that results in problems b) your particular program may not expose any kind of reordering that results in problems

That does not mean that automatically the program "can be used without memory barriers". It is virtually impossible to prove correctness of multi-threaded code running on multi-cores through testing, and it is literally impossible to prove it for all possible machines by testing on a single machine (even if that machine has 4096 cores and runs 16000 threads), simply because other machines may use different memory consistency models.


Jonas
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to