[ 
https://issues.apache.org/jira/browse/IGNITE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270474#comment-16270474
 ] 

Alexander Menshikov commented on IGNITE-4908:
---------------------------------------------

[~amashenkov] [~avinogradov]
I’ve implemented the new fast reentrant lock.

The main idea:
Current locks implementation is based on shared states in a cache, while the 
new lock uses IgniteCache#invoke* methods to update shared state atomically. 
New lock implementation doesn’t use a continuous query, so the cache became 
atomic now.

New lock implementation has two types: fair and unfair, split into different 
classes for performance reasons. 

Benchmarks show (hardware: Core i5 (2 gen) + 6GB RAM):

|| type || Number of nodes || treads per node || us/op (fair mode) ||  us/op 
(unfair mode) ||
|Old lock|  1               |        1         | 696    ± 22        | 688   ± 
19            |
|New lock|  1               |        1         | 30.5   ± 0.4       | 30.7  ± 
0.4           |
|Old lock|  2               |        1         | 892    ± 36        | 855   ± 
34            |
|New lock|  2               |        1         | 260    ± 5         | 267   ± 9 
            |
|Old lock|  5               |        1         | 8881   ± 1376      | 7087  ± 
225           |
|New lock|  5               |        1         | 895    ± 30        | 884   ± 
16            |
|Old lock|  10              |        1         | 38302  ± 1042      | 39484 ± 
2466          |
|New lock|  10              |        1         | 2136   ± 112       | 2074  ± 
75            |
|Old lock|  1               |        2         | 305    ± 10        | 288  ± 11 
            |
|New lock|  1               |        2         | 78     ± 2         | 8.6   ±  
0.1          |
|Old lock|  1               |        10        | 1777   ± 105       | 10298 ± 
894           |
|New lock|  1               |        10        | 507    ± 7         | 49    ± 1 
            |
|Old lock|  5               |        2         | 22027  ± 1789      | 15904 ± 
2321          |
|New lock|  5               |        2         | 1635   ± 66        | 50    ± 3 
            |
|Old lock|  5               |        10        | 127560 ± 18090     | 97304 ± 
6703          |
|New lock|  5               |        10        | 8479   ± 277       | 250   ± 5 
            |

Speed up: single thread + fair:   21.9x (1 node), 3.4x (2 nodes), 9.9x (5 
nodes), 17.9x (10 nodes)
Speed up: single thread + unfair: 22.4x (1 node), 3.2x (2 nodes), 8.0x (5 
nodes), 19.0x (10 nodes)

Speed up: multi-threads + fair:   3.9x (1 n,2 t), 3.5x (1 n,10t), 13.5x (5 
n,2t), 15.0x (5n, 10t)
Speed up: multi-threads + unfair: 33.5x (1 n,2t), 210x (1 n,10t), 318x  (5 
n,2t), 389x  (5n, 10t)

Benchmark result details:
1) The unfair lock has a local reentrant lock which is used for local 
synchronization as a guard before the shared state. That allows reaching 
performance close to a local reentrant lock.
2) One server node can be a primary for the shared state, which gives us a 
performance boost on only one node.
3) Speedup grows with a number of nodes.


What are NOT implemented:

1) Conditions.

Why:
It's better to create a new issue in Jira for such task because implementation 
could be complicated. It also requires additional benchmarks and tests.

2) Removing shared lock state from the cache when it is no longer needed.

Why:
In the old reentrant lock, the #close method can remove the shared state from a 
cache while other nodes still use it (but the lock should be free at the same 
time). It is not a trivial task too, in case if we want to behave safer (which 
I suppose we should do). There are a lot of details like:
- Should #close method wait for releasing the locks by all threads in local 
node?
- If yes, how to manage Ignite#close() call (it may hang while waiting for 
release in lock#close)?
And so on. It needs a new issue in Jira too and discussion.

3) Failover = false mode.

Why:
IgniteCache#invoke* helps to create failover. So in current implementation 
failover always switch on (price is very low). But if you don't want to lose 
shared state, you have to use REPLICATED (or set some backups) + FULL_SYNC 
cache mode (see the IgniteReentrantLockTest). It is also true for the old lock, 
and this is why I left an opportunity to set that options by users.
The idea of a breaking lock object (as it works in old locks) seems weird 
because you can't find out the lock was broken somewhere when you under the 
lock, but before calling #unlock.
For being honest I don't see a reason for supporting failover-off mode in a 
distributed system.

4) Some status methods: hasWaiters, getWaitQueueLength, hasQueuedThread, 
hasQueuedThreads, hasQueuedThreads

Why:
Just haven't enough time. I think these methods are not important for the first 
step and we can discuss more important topics now and leave that status methods 
for a future.

5) Method for time configuration of holding lock without a reacquiring. Right 
now it's hardcoded at 50 ms.

Why:
The same as for status methods above. Also, I think we can calculate time from 
the number of nodes (like 5*n ms). So it needs a short discussion too.



FAQ:

1) Can you create a super class for the GridCacheLockImpl2Unfair.GlobalSync and 
the GridCacheLockImpl2Fair.Sync and remove some code duplication?

I have tried, but the main problem is how Java generic works. Please look at 
lockView, and you will see that IgniteInternalCache<GridCacheInternalKey, 
GridCacheLockState2Base<?>> uncastable to 
IgniteInternalCache<GridCacheInternalKey, GridCacheLockState2Base<LockOwner>> 
or IgniteInternalCache<GridCacheInternalKey, GridCacheLockState2Base<UUID>>. So 
all operations with cache (which is the main operations in *Sync classes) we 
have to duplicate. It's a price for typesafety.

Most of other lines a little different in fair and unfair implementations, so 
there are very few lines which we can extract to the super class. And I'm not 
sure we should create the super class for such small effect.


> Ignite.reentrantLock looks much slower than IgniteCache.lock.
> -------------------------------------------------------------
>
>                 Key: IGNITE-4908
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4908
>             Project: Ignite
>          Issue Type: Improvement
>          Components: data structures
>    Affects Versions: 1.8
>            Reporter: Andrew Mashenkov
>            Assignee: Alexander Menshikov
>
> Design discussed with Alexander:
> 1) Lock 
> Entry Processor (sync) -> 
> ....add candidate. 
> ....returns "added candidate at first position"
> ....retry failover -> 
> ........if already at first position -> return true
> In case lock not acquired, wait for acquire (AbstractQueuedSynchronizer 
> should be used).
> 2) Unlock 
> Entry Processor (async) -> 
> ....remove candidate at first position
> ....retry failover -> remove only if "candidate at first position" equals to 
> expected
> ....listener ->
> ........notify current "candidate at first position" it got lock
> 3)Failover
> 3.1) Originating node failed
> Failed node listener ->
> ....For each local(primary) lock ->
> ........Entry Processor (async) ->
> ............remove candidates related no failed node
> ............retry failover not needed
> ............listener -> 
> ................if "candidate at first position" removed ->
> ....................notify current "candidate at first position" it got lock
> 3.2) Primary node failed
> After rebalancing schedule Callable ->
> ....For each local(primary) lock ->
> ........Entry Processor (async) ->
> ............remove candidates related to failed nodes
> ............retry failover not needed
> ............listener -> 
> ................notify current "candidate at first position" it got lock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to