[
https://issues.apache.org/jira/browse/IGNITE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339529#comment-16339529
]
Tim Onyschak commented on IGNITE-7090:
--------------------------------------
It will happen when their is no acquirers *period*, weather or not node failed
or goes down gracefully. So if 0 clients exists after a semaphore was created
and permit acquired any future client will get stuck waiting to acquire since
it believe the permit is still held by the node which went away.
So two noted solutions
# On graceful shutdown call onNodeRemoved
# On startup check Semaphore state.
For 1 you stated this is only the case of the graceful shutdown and we should
plan for non-graceful shutdown. Completely agree with you.
For 2 i found when debugging DataStructuresProcessor#semaphore(), the
closure(where we have access to GridCacheSemaphoreState explicitly) is only
executed during initial creation. So to get access of GridCacheSemaphoreState
would require a bunch of casting and instance of checking in the
DataStructuresProcessor#getAtomic(), I was trying to avoid that by suggesting
adding an interface we can use on specific data structures that we want to
proactively check state and act accordingly.
That make sense?
> Semaphore Stuck when no acquirers to assign permit
> --------------------------------------------------
>
> Key: IGNITE-7090
> URL: https://issues.apache.org/jira/browse/IGNITE-7090
> Project: Ignite
> Issue Type: Bug
> Components: cache, data structures
> Affects Versions: 2.1, 2.4
> Reporter: Tim Onyschak
> Priority: Major
> Fix For: 2.5
>
> Attachments: SemaphoreFailoverNoWaitingAcquirerTest.java
>
>
> If no acquirers are available to take permit of semaphore, the permit never
> gets release and any further acquirerers will wait forever.
> On node shut down DataStructuresProcessor.dsMap gets cleared out prior to
> event listener being able to execute onNodeRemoved, hence owner is never
> cleared out if it was unable to pass to a different acquirer.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)