Hmm, I cannot reproduce this behavior locally, my guess is interrupt flag is not always cleared properly in #GridCacheSemaphore.acquire method (but it doesn't have anything to do with latest fix)
Can you make it reproducible? On Fri, Apr 14, 2017 at 2:46 PM, Dmitry Karachentsev < dkarachent...@gridgain.com> wrote: > Vladislav, > > One more thing, This test [1] started failing on semaphore close when this > fix [2] was introduced. > Could you check it please? > > [1] http://ci.ignite.apache.org/viewLog.html?buildId=547151& > tab=buildResultsDiv&buildTypeId=IgniteTests_IgniteDataStrucutures# > testNameId-979977708202725050 > [2] https://issues.apache.org/jira/browse/IGNITE-1977 > > Thanks! > > 14.04.2017 15:27, Dmitry Karachentsev пишет: > > Vladislav, > > Yep, you're right. I'll fix it. > > Thanks! > > 14.04.2017 15:18, Vladisav Jelisavcic пишет: > > Hi Dmitry, > > it looks to me that this test is not valid - after the semaphore 2 fails > the permits are redistributed > so the expected number of permits should really be 20 not 10. Do you agree? > > I guess before latest fix this test was (incorrectly) passing because > permits weren't released properly. > > What do you think? > > On Fri, Apr 14, 2017 at 11:27 AM, Dmitry Karachentsev < > dkarachent...@gridgain.com> wrote: > >> Hi Vladislav, >> >> It looks like after fix was merged these tests [1] started failing. Could >> you please take a look? >> >> [1] http://ci.ignite.apache.org/viewLog.html?buildId=544238&tab= >> buildResultsDiv&buildTypeId=IgniteTests_IgniteBinaryObject >> sDataStrucutures >> >> Thanks! >> >> -Dmitry. >> >> 13.04.2017 16:15, Dmitry Karachentsev пишет: >> >> Thanks a lot! >> >> 12.04.2017 16:35, Vladisav Jelisavcic пишет: >> >> Hi Dmitry, >> >> sure, I made a fix, take a look at the PR and the comments in the ticket. >> >> Best regards, >> Vladisav >> >> On Tue, Apr 11, 2017 at 3:00 PM, Dmitry Karachentsev < >> dkarachent...@gridgain.com> wrote: >> >>> Hi Vladislav, >>> >>> Thanks for your contribution! But it seems doesn't fix related tickets, >>> in particular [1]. >>> Could you please take a look? >>> >>> [1] https://issues.apache.org/jira/browse/IGNITE-4173 >>> >>> Thanks! >>> >>> 06.04.2017 16:27, Vladisav Jelisavcic пишет: >>> >>> Hey Dmitry, >>> >>> sorry for the late reply, I'll try to bake a pr later during the day. >>> >>> Best regards, >>> Vladisav >>> >>> >>> >>> On Tue, Apr 4, 2017 at 11:05 AM, Dmitry Karachentsev < >>> dkarachent...@gridgain.com> wrote: >>> >>>> Hi Vladislav, >>>> >>>> I see you're developing [1] for a while, did you have any chance to fix >>>> it? If no, is there any estimate? >>>> >>>> [1] https://issues.apache.org/jira/browse/IGNITE-1977 >>>> >>>> Thanks! >>>> >>>> -Dmitry. >>>> >>>> >>>> >>>> 20.03.2017 10:28, Alexey Goncharuk пишет: >>>> >>>> I think re-creation should be handled by a user who will make sure that >>>>> nobody else is currently executing the guarded logic before the >>>>> re-creation. This is exactly the same semantics as with >>>>> BrokenBarrierException for j.u.c.CyclicBarrier. >>>>> >>>>> 2017-03-17 2:39 GMT+03:00 Vladisav Jelisavcic <vladis...@gmail.com>: >>>>> >>>>> Hi everyone, >>>>>> >>>>>> I agree with Val, he's got a point; recreating the lock doesn't seem >>>>>> possible >>>>>> (at least not the with the transactional cache lock/semaphore we >>>>>> have). >>>>>> Is this re-create behavior really needed? >>>>>> >>>>>> Best regards, >>>>>> Vladisav >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Mar 16, 2017 at 8:34 PM, Valentin Kulichenko < >>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>> >>>>>> Guys, >>>>>>> >>>>>>> How does recreation of the lock helps? My understanding is that >>>>>>> scenario >>>>>>> >>>>>> is >>>>>> >>>>>>> the following: >>>>>>> >>>>>>> 1. Client A creates and acquires a lock, and then starts to execute >>>>>>> >>>>>> guarded >>>>>> >>>>>>> logic. >>>>>>> 2. Client B tries to acquire the same lock and parks to wait. >>>>>>> 3. Before client A unlocks, all affinity nodes for the lock fail, >>>>>>> lock >>>>>>> disappears from the cache. >>>>>>> 4. Client B fails with exception, recreates the lock, acquires it, >>>>>>> and >>>>>>> starts to execute guarded logic concurrently with client A. >>>>>>> >>>>>>> In my view this is wrong anyway, regardless of whether this happens >>>>>>> silently or with an exception handled in user's code. Because this >>>>>>> code >>>>>>> doesn't have any way to know if client A still holds the lock or not. >>>>>>> >>>>>>> Am I missing something? >>>>>>> >>>>>>> -Val >>>>>>> >>>>>>> On Tue, Mar 14, 2017 at 10:14 AM, Dmitriy Setrakyan < >>>>>>> >>>>>> dsetrak...@apache.org >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> On Tue, Mar 14, 2017 at 12:46 AM, Alexey Goncharuk < >>>>>>>> alexey.goncha...@gmail.com> wrote: >>>>>>>> >>>>>>>> Which user operation would result in exception? To my knowledge, >>>>>>>>>> >>>>>>>>> user >>>>>> >>>>>>> may >>>>>>>> >>>>>>>>> already be holding the lock and not invoking any Ignite APIs, no? >>>>>>>>>> >>>>>>>>>> Yes, this is exactly my point. >>>>>>>>> >>>>>>>>> Imagine that a node already holds a lock and another node is >>>>>>>>> waiting >>>>>>>>> >>>>>>>> for >>>>>>> >>>>>>>> the lock. If all partition nodes leave the grid and the lock is >>>>>>>>> >>>>>>>> re-created, >>>>>>>> >>>>>>>>> this second node will immediately acquire the lock and we will have >>>>>>>>> >>>>>>>> two >>>>>> >>>>>>> lock owners. I think in this case this second node (blocked on >>>>>>>>> >>>>>>>> lock()) >>>>>> >>>>>>> should get an exception saying that the lock was lost (which is, by >>>>>>>>> >>>>>>>> the >>>>>> >>>>>>> way, the current behavior), and the first node should get an >>>>>>>>> >>>>>>>> exception >>>>>> >>>>>>> on >>>>>>> >>>>>>>> unlock. >>>>>>>>> >>>>>>>>> Makes sense. >>>>>>>> >>>>>>>> >>>> >>> >>> >> >> >> > > >