Hi Sergey, thanks for finding and submitting this bug!
Best regards, Vladisav On Thu, Nov 10, 2016 at 1:46 PM, Sergey Chugunov <sergey.chugu...@gmail.com> wrote: > Hello Vladisav, > > Thanks for confirmation! > > I created a JIRA <https://issues.apache.org/jira/browse/IGNITE-4209> to > track this issue, feel free to edit it if it isn't descriptive enough. > > Thank you, > Sergey. > > On Thu, Nov 10, 2016 at 9:44 AM, Vladisav Jelisavcic <vladis...@gmail.com> > wrote: > > > Hi Sergey, > > > > you are right - I can reproduce this also. > > It seems to me that this is caused because we treat the same both > > EVT_NODE_LEFT and EVT_NODE_FAILED events. > > In this case, node leaves the topology without failure, but fails to > > release the semaphore before EVT_NODE_LEFT event occurs on other nodes, > > this really is a bug. > > > > Thanks! > > Vladisav > > > > On Wed, Nov 9, 2016 at 11:23 AM, Sergey Chugunov < > > sergey.chugu...@gmail.com> > > wrote: > > > > > Hello Vladisav, > > > > > > I found this behavior in a very simple environment: I had two nodes on > my > > > local machine started by *ExampleNodeStartup* class and another node > > > started with *IgniteSemaphoreExample* class. > > > > > > No modifications were made to any code or configuration and I used > latest > > > version of code available in master branch. > > > No node failures occurred during test execution as well. > > > > > > As far as I understood from short investigation synchronization > semaphore > > > of name "IgniteSemaphoreExample" goes to broken state when > > > *IgniteSemaphoreExample* node normally finishes and disconnects from > the > > > cluster. > > > After that reusing of this semaphore becomes impossible and leads to > > > hanging of new nodes doing so. > > > > > > Can you reproduce this? If so I will submit a ticket and share with > you. > > > > > > Thank you, > > > Sergey. > > > > > > > > > On Wed, Nov 9, 2016 at 10:55 AM, Vladisav Jelisavcic < > > vladis...@gmail.com> > > > wrote: > > > > > > > Hi Sergey, > > > > > > > > can you please provide more information? > > > > Have you changed the example (if so, can you provide the changes you > > > made?) > > > > Is the example executed normally (without node failures)? > > > > > > > > In the example, semaphore is created in non-failover safe mode, > > > > which means it is not safe to use it once it is broken (something > like > > > > CyclicBarrier in java.util.concurrent), > > > > and the semaphore is preserved in spite of the first node failing (if > > the > > > > backups are configured), > > > > so if the first node failed, then (broken) semaphore with the same > name > > > > should still be in the cache, > > > > and this is expected behavior. > > > > > > > > If this is not the case (test was executed normally) then please > > submit a > > > > ticket describing more your setup, > > > > how many nodes, how many backups configured, etc.. > > > > > > > > Thanks! > > > > Vladisav > > > > > > > > On Tue, Nov 8, 2016 at 10:37 AM, Sergey Chugunov < > > > > sergey.chugu...@gmail.com> > > > > wrote: > > > > > > > > > Hello folks, > > > > > > > > > > I found a reason why *IgniteSemaphoreExample* hangs when started > > twice > > > > > without restarting a cluster; and it doesn't seem minor to me > > anymore. > > > > > > > > > > From here I'm going to refer to example's code so please have it > > > opened. > > > > > > > > > > So, when the first instance of node running example code finishes > and > > > > > leaves the cluster, synchronization semaphore named > > > > > "IgniteSemaphoreExample" goes to broken state on all other cluster > > > nodes. > > > > > If I restart example without restarting all nodes of the cluster, > > final > > > > > *acquire *call on the semaphore on client side hangs because all > > other > > > > > nodes treat it as broken and don't increase permits with their > > *release > > > > > *calls > > > > > on it. > > > > > > > > > > There is an interesting comment inside its *tryReleaseShared* > > > > > implementation > > > > > (BTW it is implemented in *GridCacheSemaphoreImpl*): > > > > > > > > > > "// If broken, return immediately, exception will be thrown anyway. > > > > > if (broken) > > > > > return true;" > > > > > > > > > > It seems that no exceptions are thrown neither on client side > calling > > > > > *acquire > > > > > *or on server side calling *release *methods on a broken semaphore. > > > > > > > > > > Does anybody know why it behaves in that way? Is it expected > behavior > > > at > > > > > all and if yes where is it documented? > > > > > > > > > > Thanks, > > > > > Sergey Chugunov. > > > > > > > > > > > > > > > > > > > > > -- > > > С уважением, > > > Сергей Чугунов. > > > > > >