[
https://issues.apache.org/jira/browse/MRESOLVER-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tamas Cservenak updated MRESOLVER-404:
--------------------------------------
Description:
Originally (for today, see below) Hazelcast NamedLock implementation worked
like this:
* on lock acquire, an ISemaphore DO with lock name is created (or just get, if
exists), is refCounted
* on lock release, if refCount shows 0 = uses, ISemaphore was destroyed
(releasing HZ cluster resources)
* if after some time, a new lock acquire happened for same name, ISemaphore DO
would get re-created.
Today, HZ NamedLocks implementation works in following way:
* there is only one Semaphore provider implementation, the
{{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore
Distributed Object (DO) name and does not destroys the DO
Reason for this is historical: originally, named locks precursor code was done
for Hazelcast 2/3, that used "unreliable" distributed operations, and
recreating previously destroyed DO was possible (at the cost of
"unreliability").
Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things
reliable, it was at the cost that DOs once created, then destroyed, could not
be recreated anymore. This change was applied to
{{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused
ISemaphores (release semaphore is no-op method).
But, this has an important consequence: a long running Hazelcast cluster will
have more and more ISemaphore DOs (basically as many as many Artifacts all the
builds met, that use this cluster to coordinate). Artifacts count existing out
there is not infinite, but is large enough -- especially if cluster shared
across many different/unrelated builds -- to grow over sane limit.
So, current recommendation is to have "large enough" dedicated Hazelcast
cluster and use {{semaphore-hazelcast-client}} (that is a "thin client" that
connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick
client", so puts burden onto JVM process running it as node, hence Maven as
well). But even then, regular reboot of cluster may be needed.
A proper but somewhat complicated solution would be to introduce some sort of
indirection: create as many ISemaphore as needed at the moment, and map those
onto locks names in use at the moment (and reuse unused semaphores). Problem
is, that mapping would need to be distributed as well (so all clients pick them
up, or perform new mapping), and this may cause performance penalty. But this
could be proved by exhaustive perf testing only.
The benefit would be obvious: today cluster holds as many ISemaphores as many
Artifacts were met by all the builds, that use given cluster since cluster
boot. With indirection, the number of DOs would lowered to "maximum
concurrently used", so if you have a large build farm, that is able to juggle
with 1000 artifacts at given one moment, your cluster would have 1000
ISemaphores.
was:
Originally (for today, see below) Hazelcast NamedLock implementation worked
like this:
* on lock acquire, an ISemaphore DO with lock name is created (or just get, if
exists), is refCounted
* on lock release, if refCount shows 0 = uses, ISemaphore was destroyed
(releasing HZ cluster resources)
* if after some time, a new lock acquire happened for same name, ISemaphore DO
would get re-created.
Today, HZ NamedLocks implementation works in following way:
* there is only one Semaphore provider implementation, the
{{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore
Distribute Object (DO) name and does not destroys the DO
Reason for this is historical: originally, named locks precursor code was done
for Hazelcast 2/3, that used "unreliable" distributed operations, and
recreating previously destroyed DO was possible (at the cost of
"unreliability").
Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things
reliable, it was at the cost that DOs once created, then destroyed, could not
be recreated anymore. This change was applied to
{{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused
ISemaphores (release semaphore is no-op method).
But, this has an important consequence: a long running Hazelcast cluster will
have more and more ISemaphore DOs (basically as many as many Artifacts all the
builds met, that use this cluster to coordinate). Artifacts count existing out
there is not infinite, but is large enough -- especially if cluster shared
across many different/unrelated builds -- to grow over sane limit.
So, current recommendation is to have "large enough" dedicated Hazelcast
cluster and use {{semaphore-hazelcast-client}} (that is a "thin client" that
connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick
client", so puts burden onto JVM process running it as node, hence Maven as
well). But even then, regular reboot of cluster may be needed.
A proper but somewhat complicated solution would be to introduce some sort of
indirection: create as many ISemaphore as needed at the moment, and map those
onto locks names in use at the moment (and reuse unused semaphores). Problem
is, that mapping would need to be distributed as well (so all clients pick them
up, or perform new mapping), and this may cause performance penalty. But this
could be proved by exhaustive perf testing only.
The benefit would be obvious: today cluster holds as many ISemaphores as many
Artifacts were met by all the builds, that use given cluster since cluster
boot. With indirection, the number of DOs would lowered to "maximum
concurrently used", so if you have a large build farm, that is able to juggle
with 1000 artifacts at given one moment, your cluster would have 1000
ISemaphores.
> New strategy for Hazelcast named locks
> --------------------------------------
>
> Key: MRESOLVER-404
> URL: https://issues.apache.org/jira/browse/MRESOLVER-404
> Project: Maven Resolver
> Issue Type: Improvement
> Components: Resolver
> Reporter: Tamas Cservenak
> Priority: Major
>
> Originally (for today, see below) Hazelcast NamedLock implementation worked
> like this:
> * on lock acquire, an ISemaphore DO with lock name is created (or just get,
> if exists), is refCounted
> * on lock release, if refCount shows 0 = uses, ISemaphore was destroyed
> (releasing HZ cluster resources)
> * if after some time, a new lock acquire happened for same name, ISemaphore
> DO would get re-created.
> Today, HZ NamedLocks implementation works in following way:
> * there is only one Semaphore provider implementation, the
> {{DirectHazelcastSemaphoreProvider}} that maps lock name 1:1 onto ISemaphore
> Distributed Object (DO) name and does not destroys the DO
> Reason for this is historical: originally, named locks precursor code was
> done for Hazelcast 2/3, that used "unreliable" distributed operations, and
> recreating previously destroyed DO was possible (at the cost of
> "unreliability").
> Since Hazelcast 4.x it updated to RAFT consensus algorithm and made things
> reliable, it was at the cost that DOs once created, then destroyed, could not
> be recreated anymore. This change was applied to
> {{DirectHazelcastSemaphoreProvider}} as well, by simply not dropping unused
> ISemaphores (release semaphore is no-op method).
> But, this has an important consequence: a long running Hazelcast cluster will
> have more and more ISemaphore DOs (basically as many as many Artifacts all
> the builds met, that use this cluster to coordinate). Artifacts count
> existing out there is not infinite, but is large enough -- especially if
> cluster shared across many different/unrelated builds -- to grow over sane
> limit.
> So, current recommendation is to have "large enough" dedicated Hazelcast
> cluster and use {{semaphore-hazelcast-client}} (that is a "thin client" that
> connects to cluster), instead of {{semaphore-hazelcast}} (that is "thick
> client", so puts burden onto JVM process running it as node, hence Maven as
> well). But even then, regular reboot of cluster may be needed.
> A proper but somewhat complicated solution would be to introduce some sort of
> indirection: create as many ISemaphore as needed at the moment, and map those
> onto locks names in use at the moment (and reuse unused semaphores). Problem
> is, that mapping would need to be distributed as well (so all clients pick
> them up, or perform new mapping), and this may cause performance penalty. But
> this could be proved by exhaustive perf testing only.
> The benefit would be obvious: today cluster holds as many ISemaphores as many
> Artifacts were met by all the builds, that use given cluster since cluster
> boot. With indirection, the number of DOs would lowered to "maximum
> concurrently used", so if you have a large build farm, that is able to juggle
> with 1000 artifacts at given one moment, your cluster would have 1000
> ISemaphores.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)