I see now. Thank you. Nikolay, could you please merge this change?
чт, 15 мар. 2018 г. в 18:48, Vyacheslav Daradur <daradu...@gmail.com>: > In brief: > Nodes in *separate* JVMs are shutting down by the computing task > *StopGridTask* which has sent from *local* JVM *synchronously* that > means *local* node must wait for task's finish. > > At the same time when a node in *separate* JVM executes the received > *StopGridTask* which *synchronously* calls *G.stop(igniteInstanceName, > FALSE)* which is waiting for all computing task's finish, including > *StopGridTask* which has invoked it. > > We have some kind of deadlock: > *Local* node is waiting for the computing task's finish which is > waiting for finish of execution *G.stop* which is waiting for all > computing tasks finish including *StopGridTask*. > > We have not noticed that before because we use only stopAllGrids() in > out tests which stop local JVM without waiting for nodes in other > JVMs. > > > > On Thu, Mar 15, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov....@gmail.com> > wrote: > > Please address comments in PR. > > > > I did not fully understood why sync GridStopMessage message was lost, but > > async will be successfull. Probably we need discuss it briefly. > > > > чт, 1 мар. 2018 г. в 12:11, Vyacheslav Daradur <daradu...@gmail.com>: > >> > >> Thank you, Dmitry! > >> > >> I'll join this review soon. > >> > >> On Thu, Mar 1, 2018 at 12:07 PM, Dmitry Pavlov <dpavlov....@gmail.com> > >> wrote: > >> > Hi Vyacheslav, > >> > > >> > I will take a look, but first of all I am going to review > >> > https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502 - it is > >> > impact > >> > change in testing framework. Hope you also will join to this review . > >> > > >> > Sincerely, > >> > Dmitiry Pavlov > >> > > >> > > >> > чт, 1 мар. 2018 г. в 11:13, Vyacheslav Daradur <daradu...@gmail.com>: > >> >> > >> >> Hi, Dmitry, could you please review it, because you are one of the > >> >> most experienced people in the testing framework. > >> >> > >> >> Please see comment in Jira, because it is in pretty-format there. > >> >> > >> >> On Thu, Feb 22, 2018 at 11:56 AM, Vyacheslav Daradur > >> >> <daradu...@gmail.com> wrote: > >> >> > Hi Igniters! > >> >> > > >> >> > I have investigated the issue [1] and found that stopping node in > >> >> > separate JVM may stuck thread or leave system process alive after > >> >> > test > >> >> > finished. > >> >> > The main reason is *StopGridTask* that we send from node in local > JVM > >> >> > to node in separate JVM via remote computing. > >> >> > We send job synchronously to be sure that node will be stopped, but > >> >> > job calls synchronously *G.stop(igniteInstanceName, cancel))* with > >> >> > *cancel = false*, that means node must wait to compute jobs before > it > >> >> > goes down what leads to some kind of deadlock. Using of *cancel = > >> >> > true* would solve the issue but may break some tests’ logic, for > this > >> >> > reason, I've reworked the method’s synchronization logic [2]. > >> >> > > >> >> > We have not noticed that before because we use only > *stopAllGrids()* > >> >> > in out tests which stop local JVM without waiting for nodes in > other > >> >> > JVMs. > >> >> > I believe this fix should reduce the number of flaky tests on > >> >> > TeamCity, especially which fails because of a cluster from the > >> >> > previous test has not been stopped properly. > >> >> > > >> >> > Ci.tests [3] look a bit better than in master. > >> >> > Please review prepared PR [2] and share your thoughts. > >> >> > > >> >> > [1] https://issues.apache.org/jira/browse/IGNITE-5910 > >> >> > [2] https://github.com/apache/ignite/pull/2382 > >> >> > [3] https://ci.ignite.apache.org/viewLog.html?buildId=1105939 > >> >> > > >> >> > > >> >> > On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur > >> >> > <daradu...@gmail.com> wrote: > >> >> >> Hi Igniters, > >> >> >> > >> >> >> Working on my task I found a bug at call the method > #stopGrid(name), > >> >> >> it produced ClassCastException. I created a ticket[1]. > >> >> >> > >> >> >> After it was fixed[2] I saw that nodes which was started in a > >> >> >> separate > >> >> >> JVM > >> >> >> could stay in process of operation system. > >> >> >> It was fixed too, but not sure is it fixed in proper way or not. > >> >> >> > >> >> >> Could someone review it? > >> >> >> > >> >> >> [1] https://issues.apache.org/jira/browse/IGNITE-5910 > >> >> >> [2] https://github.com/apache/ignite/pull/2382 > >> >> >> > >> >> >> -- > >> >> >> Best Regards, Vyacheslav D. > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Best Regards, Vyacheslav D. > >> >> > >> >> > >> >> > >> >> -- > >> >> Best Regards, Vyacheslav D. > >> > >> > >> > >> -- > >> Best Regards, Vyacheslav D. > > > > -- > Best Regards, Vyacheslav D. >