[jira] [Updated] (MRESOLVER-404) New strategy for Hazelcast named locks

Tamas Cservenak (Jira) Thu, 07 Sep 2023 03:47:04 -0700


     [ 
https://issues.apache.org/jira/browse/MRESOLVER-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tamas Cservenak updated MRESOLVER-404:
--------------------------------------
    Description: 
Originally (for today, see below) Hazelcast NamedLock implementation worked 
like this:
* on lock acquire, an ISemaphore DO with lock name is created (or just get, if 
exists), is refCounted
* on lock release, if refCount shows 0 = uses, ISemaphore was destroyed 
(releasing HZ cluster resources)
* if after some time, a new lock acquire happened for same name, ISemaphore DO 
would get re-created.

Today, HZ NamedLocks implementation works in following way:
* there is only one Semaphore provider implementation, the 
{{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore 
Distributed Object (DO) name and does not destroys the DO

Reason for this is historical: originally, named locks precursor code was done 
for Hazelcast 2/3, that used "unreliable" distributed operations, and 
recreating previously destroyed DO was possible (at the cost of 
"unreliability").

Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things 
reliable, it was at the cost that DOs once created, then destroyed, could not 
be recreated anymore. This change was applied to 
{{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused 
ISemaphores (release semaphore is no-op method).

But, this has an important consequence: a long running Hazelcast cluster will 
have more and more ISemaphore DOs (basically as many as many Artifacts all the 
builds met, that use this cluster to coordinate). Artifacts count existing out 
there is not infinite, but is large enough -- especially if cluster shared 
across many different/unrelated builds -- to grow over sane limit.

So, current recommendation is to have "large enough" dedicated Hazelcast 
cluster and  use {{semaphore-hazelcast-client}} (that is a "thin client" that 
connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick 
client", so puts burden onto JVM process running it as node, hence Maven as 
well). But even then, regular reboot of cluster may be needed.

A proper but somewhat complicated solution would be to introduce some sort of 
indirection: create as many ISemaphore as needed at the moment, and map those 
onto locks names in use at the moment (and reuse unused semaphores). Problem 
is, that mapping would need to be distributed as well (so all clients pick them 
up, or perform new mapping), and this may cause performance penalty. But this 
could be proved by exhaustive perf testing only.

The benefit would be obvious: today cluster holds as many ISemaphores as many 
Artifacts were met by all the builds, that use given cluster since cluster 
boot. With indirection, the number of DOs would lowered to "maximum 
concurrently used", so if you have a large build farm, that is able to juggle 
with 1000 artifacts at given one moment, your cluster would have 1000 
ISemaphores.

  was:
Originally (for today, see below) Hazelcast NamedLock implementation worked 
like this:
* on lock acquire, an ISemaphore DO with lock name is created (or just get, if 
exists), is refCounted
* on lock release, if refCount shows 0 = uses, ISemaphore was destroyed 
(releasing HZ cluster resources)
* if after some time, a new lock acquire happened for same name, ISemaphore DO 
would get re-created.

Today, HZ NamedLocks implementation works in following way:
* there is only one Semaphore provider implementation, the 
{{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore 
Distribute Object (DO) name and does not destroys the DO

Reason for this is historical: originally, named locks precursor code was done 
for Hazelcast 2/3, that used "unreliable" distributed operations, and 
recreating previously destroyed DO was possible (at the cost of 
"unreliability").

Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things 
reliable, it was at the cost that DOs once created, then destroyed, could not 
be recreated anymore. This change was applied to 
{{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused 
ISemaphores (release semaphore is no-op method).

But, this has an important consequence: a long running Hazelcast cluster will 
have more and more ISemaphore DOs (basically as many as many Artifacts all the 
builds met, that use this cluster to coordinate). Artifacts count existing out 
there is not infinite, but is large enough -- especially if cluster shared 
across many different/unrelated builds -- to grow over sane limit.

So, current recommendation is to have "large enough" dedicated Hazelcast 
cluster and  use {{semaphore-hazelcast-client}} (that is a "thin client" that 
connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick 
client", so puts burden onto JVM process running it as node, hence Maven as 
well). But even then, regular reboot of cluster may be needed.

A proper but somewhat complicated solution would be to introduce some sort of 
indirection: create as many ISemaphore as needed at the moment, and map those 
onto locks names in use at the moment (and reuse unused semaphores). Problem 
is, that mapping would need to be distributed as well (so all clients pick them 
up, or perform new mapping), and this may cause performance penalty. But this 
could be proved by exhaustive perf testing only.

The benefit would be obvious: today cluster holds as many ISemaphores as many 
Artifacts were met by all the builds, that use given cluster since cluster 
boot. With indirection, the number of DOs would lowered to "maximum 
concurrently used", so if you have a large build farm, that is able to juggle 
with 1000 artifacts at given one moment, your cluster would have 1000 
ISemaphores.


> New strategy for Hazelcast named locks
> --------------------------------------
>
>                 Key: MRESOLVER-404
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-404
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>            Reporter: Tamas Cservenak
>            Priority: Major
>
> Originally (for today, see below) Hazelcast NamedLock implementation worked 
> like this:
> * on lock acquire, an ISemaphore DO with lock name is created (or just get, 
> if exists), is refCounted
> * on lock release, if refCount shows 0 = uses, ISemaphore was destroyed 
> (releasing HZ cluster resources)
> * if after some time, a new lock acquire happened for same name, ISemaphore 
> DO would get re-created.
> Today, HZ NamedLocks implementation works in following way:
> * there is only one Semaphore provider implementation, the 
> {{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore 
> Distributed Object (DO) name and does not destroys the DO
> Reason for this is historical: originally, named locks precursor code was 
> done for Hazelcast 2/3, that used "unreliable" distributed operations, and 
> recreating previously destroyed DO was possible (at the cost of 
> "unreliability").
> Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things 
> reliable, it was at the cost that DOs once created, then destroyed, could not 
> be recreated anymore. This change was applied to 
> {{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused 
> ISemaphores (release semaphore is no-op method).
> But, this has an important consequence: a long running Hazelcast cluster will 
> have more and more ISemaphore DOs (basically as many as many Artifacts all 
> the builds met, that use this cluster to coordinate). Artifacts count 
> existing out there is not infinite, but is large enough -- especially if 
> cluster shared across many different/unrelated builds -- to grow over sane 
> limit.
> So, current recommendation is to have "large enough" dedicated Hazelcast 
> cluster and  use {{semaphore-hazelcast-client}} (that is a "thin client" that 
> connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick 
> client", so puts burden onto JVM process running it as node, hence Maven as 
> well). But even then, regular reboot of cluster may be needed.
> A proper but somewhat complicated solution would be to introduce some sort of 
> indirection: create as many ISemaphore as needed at the moment, and map those 
> onto locks names in use at the moment (and reuse unused semaphores). Problem 
> is, that mapping would need to be distributed as well (so all clients pick 
> them up, or perform new mapping), and this may cause performance penalty. But 
> this could be proved by exhaustive perf testing only.
> The benefit would be obvious: today cluster holds as many ISemaphores as many 
> Artifacts were met by all the builds, that use given cluster since cluster 
> boot. With indirection, the number of DOs would lowered to "maximum 
> concurrently used", so if you have a large build farm, that is able to juggle 
> with 1000 artifacts at given one moment, your cluster would have 1000 
> ISemaphores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (MRESOLVER-404) New strategy for Hazelcast named locks

Reply via email to