While watching this thread, I started to ask myself 2 questions (which may be related):

They just came to mind a multi-core systems where mentioned, possible even systems with several CPUs. (Admiringly they are more looking like they should be on an intel-mailing list, I just hope someone may know...)

1)
procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue);
begin
tab[lp and fMask]:=aObject;
end;

The index ("lp and fMask") has been derived via "interlockedIncrement", and the surrounding code makes sure, that only one thread will access this value at this time.

But lets assume the value was read immediately before, by another core/cpu. It therefore is in that core/cpu's cache. Will this cache be invalidated/updated by a *simple* write to memory? Or will the other core/cpu see the old value from it's cache? I am no expert on this, but from the page referred below, "lock"ed cpu-instructions, take special care of this.
I don't know about unlocked instructions?

http://static.scribd.com/docs/59o7jahfstz7r.swf?INITIAL_VIEW=width
chapter 7
Because frequently used memory locations are often
cached in a processor's L1 or L2 caches, atomic operations can often be carried out inside a processor's caches without asserting the bus lock. Here the processor's cache coherency protocols insure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on
cached memory locations.
"caching the same memory locations are managed properly while atomic operations are performed" What does the cache coherency do (if anything) while non atomic operations are performed?

2)
I found various references that interlockedIncrement and co, work only on 32 bit bounded data? This may or may not vary on the CPU. The Intel doc only says, it will affect execution time, but looking at the MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx it says it must be on a 32bit boundary.
Does that affect FPC? ( as there may be none intel CPUs?)

If so, then the Implementation of the Queue would have to ensure alignment (as I believe fpc, aligns integer on 16 bit?)

Martin

Florian Klaempfl wrote:
DarekM schrieb:
Martin Friebe pisze:
You will need to test it, but the following may also work

procedure tFlQueue.push(const tm : tNodeQueue);
var
 newTemp,
 lastTail,
 newTail : integer;
begin
 newTemp := temp;
 while newTemp >= fsize begin
   // if it is true, every thread is going to attempt to fix it,
before doing the increment.
   // this means no thread will increase the value
   //  => one thread will to succeed (since the only reason "temp NE
newTemp" is that temp has been decreased)
   // newTemp is bigger than fsize, so the result can not become
negative.
interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp));

    newTemp := temp;
 end;
newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
always in range
 lastTail:=newTemp-1;
 if lastTail < 0 then lastTail := fsize -1;
 setObject(lastTail,tm); // you can remove the "mod" in setobject
 repeat
pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail));

 until (newTail=lastTail);
end;

It seems ok, but then we have 2 IF more.

An if is unimportant, more important is the number of locked operations,
especially on multi core systems they might eat hundreds of clock cycles.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to