Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread DarekM

Micha Nelissen pisze:

DarekM wrote:
  

Hi
This is my proposition of algorithm and its implementing multithreaded
FIFO queue without lock.



Hmm 'Push' and 'Pop' sound like a stack, but the implementation seems to
implement a FIFO indeed, with a head and tail.
  
It should be enqueue and dequeue. I was left  because first I try 
implement stack.



You're sure about the situation where two threads are accessing head and
tail at the same time, and there is only one item in your FIFO ?
  
Of course (if  CAS (interlockedCompareExchange)  has proper 
implementation , that, I think, should be avoided

Queue is empty when tail=head,
when is not equal only one thread can receive object form queue,  second 
make again iteration



The same with ABA problem.


Darek
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread Micha Nelissen
DarekM wrote:
 Hi
 This is my proposition of algorithm and its implementing multithreaded
 FIFO queue without lock.

Hmm 'Push' and 'Pop' sound like a stack, but the implementation seems to
implement a FIFO indeed, with a head and tail.

You're sure about the situation where two threads are accessing head and
tail at the same time, and there is only one item in your FIFO ?

Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread Martin Friebe

What about a long running (eg daemon) application?

If temp/tail hits the upper boundary of Integer?

(If I understand it correctly)
I don't know if interlockedIncrement gives a boundary error, but if not,
it still fails.
- With currently integer, it gets a negative value, once crossing
0x7fff, and SetObject will attempt to read/write out-of-bounds memory.
- Assuming temp/tail being unsigned: it will go from 0x to 0.
0x mod fsize may return a value greater 0, 0x00 mod fsize
will be zero. You make an unexpected jump within the list.

Martin

DarekM wrote:

Hi
This is my proposition of algorithm and its implementing multithreaded 
FIFO queue without lock.

First use array of pointers to handle messages.
I've use it in my program, it works.
I think it may by useful.

site: http://www.emadar.com/fpc/lockfree.htm
source: http://www.emadar.com/fpc/flqueue.pas

There is also generic implementation of the same algorithm. Its my 
first generic approach.

Any help appreciated.


it;s not so much text, but  source is very short, its explain much 
better.



Darek
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread DarekM

Martin Friebe pisze:

What about a long running (eg daemon) application?

If temp/tail hits the upper boundary of Integer?

(If I understand it correctly)
I don't know if interlockedIncrement gives a boundary error, but if not,
it still fails.
- With currently integer, it gets a negative value, once crossing
0x7fff, and SetObject will attempt to read/write out-of-bounds 
memory.

- Assuming temp/tail being unsigned: it will go from 0x to 0.
0x mod fsize may return a value greater 0, 0x00 mod fsize
will be zero. You make an unexpected jump within the list.




Thats right.
But I can avoid this.
1. make length of tab equal power of two
2. use longword instead of integer
3.instead MOD use   result:=tab[lp and  fmask];
and fmask:=$F or $FF or $FFF


Thanks to find bug.


Darek




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread DarekM

Martin Friebe pisze:

You will need to test it, but the following may also work

procedure tFlQueue.push(const tm : tNodeQueue);
var
 newTemp,
 lastTail,
 newTail : integer;
begin
 newTemp := temp;
 while newTemp = fsize begin
   // if it is true, every thread is going to attempt to fix it, 
before doing the increment.

   // this means no thread will increase the value
   //  = one thread will to succeed (since the only reason temp NE 
newTemp is that temp has been decreased)
   // newTemp is bigger than fsize, so the result can not become 
negative.

interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp)); 


newTemp := temp;
 end;
newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are 
always in range

 lastTail:=newTemp-1;
 if lastTail  0 then lastTail := fsize -1;
 setObject(lastTail,tm); // you can remove the mod in setobject
 repeat
   
pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail)); 


 until (newTail=lastTail);
end;


It seems ok, but then we have 2 IF more.
And the same should repeat with pop




The repeat at the end does still work. In order to set tail to 0 
(newTemp=0) = tail must first get 9 (as lastTail will be 9).


Only flaw is, if fsize is close enough to the upper data tpye 
boundary, and enough threads do InterLockedInvrement, (after all 
having simultaneously passed the repeat at the start), then you can 
still run over the boundary.
However this can only happen if you have more threads, than 
(upperBoundary - fsize).

So you can specify, that it is save up to this amount of parallel threads
Because we cant change size of queue in run, we have to make it so big 
that always will be enough.
For me it should be ten time bigger, than observe maximum (1000 its only 
4 kB of ram).






IMHO the unit should have a warning (in the comments) that there it is 
the users responsibility *not* to store more elements than fsize. 
There is no check, and you will loose data, if you try to do so.

Ok, Ive write warning in source


And i've upload new version  of program.


Darek

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread Florian Klaempfl
DarekM schrieb:
 Martin Friebe pisze:
 You will need to test it, but the following may also work

 procedure tFlQueue.push(const tm : tNodeQueue);
 var
  newTemp,
  lastTail,
  newTail : integer;
 begin
  newTemp := temp;
  while newTemp = fsize begin
// if it is true, every thread is going to attempt to fix it,
 before doing the increment.
// this means no thread will increase the value
//  = one thread will to succeed (since the only reason temp NE
 newTemp is that temp has been decreased)
// newTemp is bigger than fsize, so the result can not become
 negative.

 interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp));

 newTemp := temp;
  end;
 newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
 always in range
  lastTail:=newTemp-1;
  if lastTail  0 then lastTail := fsize -1;
  setObject(lastTail,tm); // you can remove the mod in setobject
  repeat
   
 pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail));

  until (newTail=lastTail);
 end;

 It seems ok, but then we have 2 IF more.

An if is unimportant, more important is the number of locked operations,
especially on multi core systems they might eat hundreds of clock cycles.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]

2008-01-27 Thread Martin Friebe
While watching this thread, I started to ask myself 2 questions (which 
may be related):


They just came to mind a multi-core systems where mentioned, possible 
even systems with several CPUs.
(Admiringly they are more looking like they should be on an 
intel-mailing list, I just hope someone may know...)


1)
procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue);
begin
tab[lp and fMask]:=aObject;
end;

The index (lp and fMask) has been derived via interlockedIncrement, 
and the surrounding code makes sure, that only one thread will access 
this value at this time.


But lets assume the value was read immediately before, by another 
core/cpu. It therefore is in that core/cpu's cache. Will this cache be 
invalidated/updated by a *simple* write to memory? Or will the other 
core/cpu see the old value from it's cache?
I am no expert on this, but from the page referred below, locked 
cpu-instructions, take special care of this.

I don't know about unlocked instructions?

http://static.scribd.com/docs/59o7jahfstz7r.swf?INITIAL_VIEW=width
chapter 7

Because frequently used memory locations are often
cached in a processor's L1 or L2 caches, atomic operations can often 
be carried out
inside a processor's caches without asserting the bus lock. Here the 
processor's
cache coherency protocols insure that other processors that are 
caching the same
memory locations are managed properly while atomic operations are 
performed on

cached memory locations.
caching the same memory locations are managed properly while atomic 
operations are performed
What does the cache coherency do (if anything) while non atomic 
operations are performed?


2)
I found various references that interlockedIncrement and co, work only 
on 32 bit bounded data? This may or may not vary on the CPU.
The Intel doc only says, it will affect execution time, but looking at 
the MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx it 
says it must be on a 32bit boundary.

Does that affect FPC? ( as there may be none intel CPUs?)

If so, then the Implementation of the Queue would have to ensure 
alignment (as I believe fpc, aligns integer on 16 bit?)


Martin

Florian Klaempfl wrote:

DarekM schrieb:
  

Martin Friebe pisze:


You will need to test it, but the following may also work

procedure tFlQueue.push(const tm : tNodeQueue);
var
 newTemp,
 lastTail,
 newTail : integer;
begin
 newTemp := temp;
 while newTemp = fsize begin
   // if it is true, every thread is going to attempt to fix it,
before doing the increment.
   // this means no thread will increase the value
   //  = one thread will to succeed (since the only reason temp NE
newTemp is that temp has been decreased)
   // newTemp is bigger than fsize, so the result can not become
negative.
   
interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp));


newTemp := temp;
 end;
  

newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
always in range


 lastTail:=newTemp-1;
 if lastTail  0 then lastTail := fsize -1;
 setObject(lastTail,tm); // you can remove the mod in setobject
 repeat
  
pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail));


 until (newTail=lastTail);
end;

  

It seems ok, but then we have 2 IF more.



An if is unimportant, more important is the number of locked operations,
especially on multi core systems they might eat hundreds of clock cycles.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
  

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Fpc Target directory function

2008-01-27 Thread L

Does a function like this exist:

function FpcTargetDir: string;
begin
 result:= {$I %FPCTARGETCPU%}+'-'+{$I %FPCTARGETOS%};
end;

Above works for my needs, just wonder if something already exists in RTL 
or if it should be added for convenience.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread DarekM

Florian Klaempfl pisze:

An if is unimportant, more important is the number of locked operations,
especially on multi core systems they might eat hundreds of clock cycles.
  
There are atomic operations, the should not  eat much more than ordinal 
INC or ADD,

and second CAS is invoked only one time per fSize iteriations.


Darek


__


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Fpc Target directory function

2008-01-27 Thread Peter Vreman
 Does a function like this exist:

 function FpcTargetDir: string;
 begin
   result:= {$I %FPCTARGETCPU%}+'-'+{$I %FPCTARGETOS%};
 end;

 Above works for my needs, just wonder if something already exists in RTL
 or if it should be added for convenience.

It doesn't exists and i don't see a reason why it should be added. The above 
line is simple. And
there is no generic rule that says that you need to encode TargetDir like 
cpu-os for every
program.

Peter


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LockFree Queue algorithm

2008-01-27 Thread Florian Klaempfl
DarekM schrieb:
 Florian Klaempfl pisze:
 An if is unimportant, more important is the number of locked operations,
 especially on multi core systems they might eat hundreds of clock cycles.
   
 There are atomic operations, the should not  eat much more than ordinal
 INC or ADD,

If course they are. They require no more a bus lock as in 486 times but
a synchronization of all caches for this particular memory location.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel