[ 
https://issues.apache.org/jira/browse/IGNITE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339529#comment-16339529
 ] 

Tim Onyschak commented on IGNITE-7090:
--------------------------------------

It will happen when their is no acquirers *period*, weather or not node failed 
or goes down gracefully. So if 0 clients exists after a semaphore was created 
and permit acquired any future client will get stuck waiting to acquire since 
it believe the permit is still held by the node which went away. 

So two noted solutions
 # On graceful shutdown call onNodeRemoved
 # On startup check Semaphore state.

For 1 you stated this is only the case of the graceful shutdown and we should 
plan for non-graceful shutdown. Completely agree with you.

For 2 i found when debugging DataStructuresProcessor#semaphore(), the 
closure(where we have access to GridCacheSemaphoreState explicitly)  is only 
executed during initial creation. So to get access of GridCacheSemaphoreState 
would require a bunch of casting and instance of checking in the 
DataStructuresProcessor#getAtomic(), I was trying to avoid that by suggesting 
adding an interface we can use on specific data structures that we want to 
proactively check state and act accordingly.

 

That make sense?

 

> Semaphore Stuck when no acquirers to assign permit
> --------------------------------------------------
>
>                 Key: IGNITE-7090
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7090
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, data structures
>    Affects Versions: 2.1, 2.4
>            Reporter: Tim Onyschak
>            Priority: Major
>             Fix For: 2.5
>
>         Attachments: SemaphoreFailoverNoWaitingAcquirerTest.java
>
>
> If no acquirers are available to take permit of semaphore, the permit never 
> gets release and any further acquirerers will wait forever. 
> On node shut down DataStructuresProcessor.dsMap gets cleared out prior to 
> event listener being able to execute onNodeRemoved, hence owner is never 
> cleared out if it was unable to pass to a different acquirer. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to