interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]

Martin Friebe Sun, 27 Jan 2008 15:30:47 -0800

While watching this thread, I started to ask myself 2 questions (whichmay be related):

They just came to mind a multi-core systems where mentioned, possibleeven systems with several CPUs.(Admiringly they are more looking like they should be on anintel-mailing list, I just hope someone may know...)


1)
procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue);
begin
tab[lp and fMask]:=aObject;
end;

The index ("lp and fMask") has been derived via "interlockedIncrement",and the surrounding code makes sure, that only one thread will accessthis value at this time.

But lets assume the value was read immediately before, by anothercore/cpu. It therefore is in that core/cpu's cache. Will this cache beinvalidated/updated by a *simple* write to memory? Or will the othercore/cpu see the old value from it's cache?I am no expert on this, but from the page referred below, "lock"edcpu-instructions, take special care of this.

I don't know about unlocked instructions?

http://static.scribd.com/docs/59o7jahfstz7r.swf?INITIAL_VIEW=width
chapter 7

Because frequently used memory locations are often
cached in a processor's L1 or L2 caches, atomic operations can oftenbe carried outinside a processor's caches without asserting the bus lock. Here theprocessor'scache coherency protocols insure that other processors that arecaching the samememory locations are managed properly while atomic operations areperformed on
cached memory locations.

"caching the same memory locations are managed properly while atomicoperations are performed"What does the cache coherency do (if anything) while non atomicoperations are performed?

2)

I found various references that interlockedIncrement and co, work onlyon 32 bit bounded data? This may or may not vary on the CPU.The Intel doc only says, it will affect execution time, but looking atthe MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx itsays it must be on a 32bit boundary.

Does that affect FPC? ( as there may be none intel CPUs?)

If so, then the Implementation of the Queue would have to ensurealignment (as I believe fpc, aligns integer on 16 bit?)


Martin

Florian Klaempfl wrote:

DarekM schrieb:

Martin Friebe pisze:

You will need to test it, but the following may also work

procedure tFlQueue.push(const tm : tNodeQueue);
var
 newTemp,
 lastTail,
 newTail : integer;
begin
 newTemp := temp;
 while newTemp >= fsize begin
   // if it is true, every thread is going to attempt to fix it,
before doing the increment.
   // this means no thread will increase the value
   //  => one thread will to succeed (since the only reason "temp NE
newTemp" is that temp has been decreased)
   // newTemp is bigger than fsize, so the result can not become
negative.

interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp));


    newTemp := temp;
 end;

newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
always in range

 lastTail:=newTemp-1;
 if lastTail < 0 then lastTail := fsize -1;
 setObject(lastTail,tm); // you can remove the "mod" in setobject
 repeat
pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail));
 until (newTail=lastTail);
end;

It seems ok, but then we have 2 IF more.


An if is unimportant, more important is the number of locked operations,
especially on multi core systems they might eat hundreds of clock cycles.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]

Reply via email to