*Hello,A fast concurrent FIFO Queue version 1.3Authors: Amine Moulay
Ramdane Description:A very fast concurrent FIFO queue that satisfies many
requirements: it has more parallelism than the two locks algorithm, it is
FIFO fair , it's starvation-free and it minimizes efficiently the
cache-coherence traffic and it is energy efficient on the pop() side when
you set the wait parameter to true in the construtor: when there is no
items in the queue it will not spin-wait , but it will block wait on my
SemaMonitor, and when the wait parameter of the constructor is set to false
it uses only an atomic increment on the push() side and an atomic increment
on the pop() side, so it's very fast. The number of threads on the pop()
side are limited by the length of the queue, the length of the queue must
be greater or equal to 2^10, i have set it like that. You have 3 options
for setting the kind of locks, just look inside defines.inc , if you want
to set it for my scalable lock called scalable MLock just uncomment the
option MLock inside defines.inc, if you want to set it for Ticket Spinlock
just uncomment the option TicketSpinlock ,If you want to set it for
Spinlock just uncomment the option Spinlock, the Ticket Spinlock option
scored 12.5 millions of transactions per second on my 2.4 GHz Quadcore.The
size of the queue must be passed to the constructor and it must be a power
of 2.Here is my explanation of my algorithm, you will find the source code
here:*
*https://sites.google.com/site/aminer68/concurrent-fifo-queue-1
<https://sites.google.com/site/aminer68/concurrent-fifo-queue-1>*
*We begin by the push() method, its source code look like this: ===
function TWQueue.push(tm : long):boolean;var
lastHead,newtemp:long;k:integer;beginresult:=true;newTemp:=LockedIncLong(head);lastHead:=newtemp-1;while
getlength1(newtemp) > fsize do {$IFDEF FPC} ThreadSwitch; {$ENDIF} {$IFDEF
Delphi} sleep(0); {$ENDIF} repeatif fcount1^[lastHead and fMask].flag1=0
then break; {$IFDEF FPC}ThreadSwitch;{$ENDIF}{$IFDEF
Delphi}sleep(0);{$ENDIF}until
false;setObject(lastHead,tm);fcount1^[lastHead and fMask].flag1:=1;end;===
Now we look at the rest of the code... Every cell of the array based queue
look like this: type cell = record obj:long; flag1:long; flag2:long;
{$IFDEF CPU32} cache:typecache2; {$ENDIF CPU32} {$IFDEF CPU64}
cache:typecache3; {$ENDIF CPU64} end;It's cache padded to a cache-line size
and the array is aligned on 64 bytes so that to avoid false-sharing... When
the thread enters the push() method it will increment the "head" like this
with an atomic increment like this:newTemp:=LockedIncLong(head);
lastHead:=newtemp-1; after that the thread will test if
"getlength1(newtemp) > fsize", that means if we have not gone beyong the
length of the queue , if we have gone beyong the length of the queue we
will spin-wait, if we have not gone beyong the length of the queue the
thread will continue , and the thread will test with an if
fcount1^[lastHead and fMask].flag1=0, that means it will test if there is
no item in the cell , if fcount1^[lastHead and fMask].flag1 is not equal to
0 we will spin-wait , if fcount1^[lastHead and fMask].flag1 is equal to
zero that means there is no item in the cell so we will write the item on
the cell by doing this: setObject(lastHead,tm); and after that we set the
flag of the cell to 1 so that the pop() can read from it when it's set to 1
like this: fcount1^[lastHead and fMask].flag1:=1; so as you have noticed i
have reasonned and explained to you the push() side, and i think my
reasonning is correct here. Now here is the new pop() method... == function
TWQueue.pop(var obj:long):boolean;var lastTail,newtemp: long;
k:integer;beginresult:=true;newTemp:=LockedIncLong(temp);lastTail:=newtemp-1;repeatif
fcount1^[lastTail and fMask].flag1=1 then break;{$IFDEF
FPC}ThreadSwitch;{$ENDIF}{$IFDEF Delphi}sleep(0);{$ENDIF}until (false);
obj:=getObject(lastTail);repeatif fcount1^[lastTail and fMask].flag2=1 then
break;{$IFDEF FPC}ThreadSwitch;{$ENDIF}{$IFDEF
Delphi}sleep(0);{$ENDIF}until (false); fcount1^[lastTail and
fMask].flag2:=0;fcount1^[lastTail and
fMask].flag1:=0;tail:=newtemp;fcount1^[newtemp and fMask].flag2:=1;end;==
So as you have noticed we have to reason about this algorithm to prove that
all is correct, so follow with me please... When the thread enters the
pop() method they will atomically increment the "temp" variable , and after
that the thread will wait that fcount1^[lastTail and fMask].flag1 is equal
to 1, that means it will wait that an item is available on the cell of
number "lastTail and fMask", and after that it will get the item from the
cell , and after that the thread will wait on fcount1^[lastTail and
fMask].flag2 to equal 1, than means it will wait for previous pop threads
on lastTail-1 to set fcount1^[lastTail and fMask].flag2 to 1, and after
that the thread will set the flag2 to 0 and after that the thread will set
flag1 to 0 so that the push side will be able to put an item on the cell,
but even though the thread sets flag1 to 0 before incrementing "Tail", the
algorithm is correct since the push threads are limited on how much they
can push by the length of the queue , and after that the thread will
increment tail with "tail:=newtemp" and after that the thread will signal
the next poping thread by setting flag2 to 1, so i think my algorithm is
correct and efficient.Note: The number of threads on the pop() side are
limited by the length of the queue, the length of the queue must be greater
or equal to 2^10, i have set it like that.So as you have noticed i have
reasonned about my algorithm an i think my algorithm is correct now.
Finally i have benchmarked my algorithm without using my SemaMonitor and it
has scored 12.5 millions of transactions per second on my 2.4 GHz Quadcore,
and it minimizes efficiently the cache-coherence traffic and it reduces the
contention efficiently so that it can better scale with more and more
cores.You can download my new algorithm of a very fast concurrent FIFO
queue from: **https://sites.google.com/site/aminer68/concurrent-fifo-queue-1
<https://sites.google.com/site/aminer68/concurrent-fifo-queue-1>*
*Thank you,Amine Moulay Ramdane.*
--
---
You received this message because you are subscribed to the Google Groups
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/lock-free/b07d982f-4779-474f-b4ae-e0911241fff2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.