[lock-free] Scalable SelockX is here...

Amine Moulay Ramdane Mon, 09 Mar 2015 21:44:52 -0700

Hello,


Scalable SeqlockX version 1.0

Author: Amine Moulay Ramdane. 

Description:

This scalable Seqlock's variant was invented by Amine Moulay Ramdane, it is 
much faster and more powerfull than the the Dmitry Vyukov's distributed 
reader-writer mutex , cause the Dmitry Vyukov's distributed reader-writer 
lock will become slower and slower on the writer side with more and more 
cores because it transfers too many cache-lines between cores on the writer 
side, this invention has eliminated the weakness of the Dmitry Vyukov's 
distributed reader-writer mutex, so that the writers throughput has become 
faster and very fast, and my scalable SeqlockX elminates the weaknesses of 
the classical Seqlock (sequential lock) that is "livelock", so it avoids 
livelock when there is more writers, so my scalable SeqlockX is a powerful 
sychronization mechanism that can replace RCU and that can replace Seqlock 
and that can replace Dmitry Vyukov's distributed reader-writer mutex. And 
if you look at my algorithm you will notice that on the reader side it 
looks like a classical sequential lock, but i have added a variable called 
fcount2^.fcount2 that allows the readers to not livelock when there is more 
writers ,this scalable SeqlockX is faster and more scalable than the Dmitry 
Vyukov's distributed reader-writer mutex and it beats the classical Seqlock 
because it avoids livelock when there is many writers.

Please look at the "test.pas" example that i have included and you will 
notice that the reader side have to be used in a transactional mode with a 
repeat-until loop, like this:

repeat

t.rlock(id1);

until t.runlock(id1);

it is like a transactional mode of a Seqlock and id1 must be of type "long" 
that must be imported from the SeqlockX unit. The writer side is easy to 
program , please look at the "test.pas" example and you will understand how 
to use my scalable SeqlockX.

My scalable SeqlockX can be used from accross processes and threads. 

My new invention called scalable SeqlockX k beats the following algorithm, 
it is more scalable and faster than the following algorithm of Dmitry 
Vyukov called Distributed reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex
 

And my new invention called scalable SeqlockX beats also the classical 
Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wiki/Seqlock

And this new invention called scalable SeqlockX can replace RCU cause it is 
a powerfull synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential lock 
that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and by 
looking first at the weakness of the following algorithm of Dmitry Vyukov's 
Reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex
 

look at the writer side of the Dmitry Vykov's algorithm , it is doing with 
every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

pthread_rwlock_wrlock(&mtx-cell[i].mtx);

but this render this algorithm inefficient for the writer side , cause 
every pthread_rwlock_wrlock() call transfer many cache-lines betwenn cores, 
so this will generate too much cache line transfers between cores when each 
writer executes distr_rw_mutex_wrlock() and this will become worse and 
worse when you add more and more cores, i mean this will transfer more and 
more cache-lines between cores with more and more cores... so this will 
render the writer side throughput slow and this is the weakness of this 
algorithm...

So this is why i have used a sequential lock mechanism in the write side of 
my new algorithm so that to minimize efficiently the cache-lines transfers, 
so if you look at my algorithm, it adds the follwing to the classical 
Seqlock:

On the writer side if the result of TSeqlockX.RUnlock() method is "false", 
that means the reader transaction has failed, so i am putting 1 into the 
FCount2^.FCount2 variable and if it's true, that means the reader 
transaction has succeeded, i am putting 0 into the FCount2^.FCount2 
variable, and inside the writer side inside the TSeqlockX.WLock() method, i 
am doing this: 

while FCount2^.FCount2<>0 do sleep(0); 

So if FCount2^.FCount2 is equal to its initial value it will exit the while 
loop, and if FCount2^.FCount2 is equal to 1 it will wait for 
TSeqlockX.RUnlock() to return true, that means it will wait for a reader 
transaction to succeed, this part of the algorithm allows my algorithm to 
avoids livelock when there is many writers, and the rest of the algorithm 
look like a classical seqlock. That's all.

About the sequential consistency of my scalable distributed sequential 
lock, my algorithm works on x86 architecture and it compiles on Delphi and 
FreePascal compilers they both respect the strong memory model of the x86 
architecture. i think my algorithm is correct cause look at the source code 
of the WLock() method, since i am using a Ticket spinlock with a 
proportional backoff on the writer side, the Ticket spinlock is using a 
"lock add" assembler instruction to increment a counter in the Enter() 
method of the ticket spinlock , and this "lock add" assembler instruction 
is a barrier for stores and loads on x86, and the loads of Fcount2^.fcount2 
is not reordered with the previous atomic lock of the ticket spinlock, and 
the stores of FCount5^.fcount5 are not reordered with olher stores and 
older loads , so the WLock() method is sequentially consistent and correct, 
now look at the WUnlock() , we don't need an "sfence" cause stores and 
loads on WUnlock() are not reordered with older stores or older loads on 
x86 etc. so WUnlock() method is sequentially consistent and correct, now 
look at the RLock() method, it's also sequentially consistent., so all in 
all i think my algorithm is sequentially consistent and correct on x86 and 
on the Delphi and FreePascal compilers. So i think you can be confident 
cause i have reasoned correctly and i think my scalable SeqlockX algorithm 
is correct and it is a powerfull synchronization mechanism that can replace 
RCU and that can replace classical Seqlock cause it beats Seqlock.

You have to know also that the first parameter of the constructor is the 
name that identifies the scalable SeqlockX object accross processes 
bounderies.

You can download my scalable SeqlockX from:

https://sites.google.com/site/aminer68/scalable-seqlockx


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode.... 

Required Delphi switches: -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


Thank you,
Amine Moulay Ramdane.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/920897fe-8fa3-4785-b323-d263757dc585%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[lock-free] Scalable SelockX is here...

Reply via email to