Please address comments in PR. I did not fully understood why sync GridStopMessage message was lost, but async will be successfull. Probably we need discuss it briefly.
чт, 1 мар. 2018 г. в 12:11, Vyacheslav Daradur <daradu...@gmail.com>: > Thank you, Dmitry! > > I'll join this review soon. > > On Thu, Mar 1, 2018 at 12:07 PM, Dmitry Pavlov <dpavlov....@gmail.com> > wrote: > > Hi Vyacheslav, > > > > I will take a look, but first of all I am going to review > > https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502 - it is > impact > > change in testing framework. Hope you also will join to this review . > > > > Sincerely, > > Dmitiry Pavlov > > > > > > чт, 1 мар. 2018 г. в 11:13, Vyacheslav Daradur <daradu...@gmail.com>: > >> > >> Hi, Dmitry, could you please review it, because you are one of the > >> most experienced people in the testing framework. > >> > >> Please see comment in Jira, because it is in pretty-format there. > >> > >> On Thu, Feb 22, 2018 at 11:56 AM, Vyacheslav Daradur > >> <daradu...@gmail.com> wrote: > >> > Hi Igniters! > >> > > >> > I have investigated the issue [1] and found that stopping node in > >> > separate JVM may stuck thread or leave system process alive after test > >> > finished. > >> > The main reason is *StopGridTask* that we send from node in local JVM > >> > to node in separate JVM via remote computing. > >> > We send job synchronously to be sure that node will be stopped, but > >> > job calls synchronously *G.stop(igniteInstanceName, cancel))* with > >> > *cancel = false*, that means node must wait to compute jobs before it > >> > goes down what leads to some kind of deadlock. Using of *cancel = > >> > true* would solve the issue but may break some tests’ logic, for this > >> > reason, I've reworked the method’s synchronization logic [2]. > >> > > >> > We have not noticed that before because we use only *stopAllGrids()* > >> > in out tests which stop local JVM without waiting for nodes in other > >> > JVMs. > >> > I believe this fix should reduce the number of flaky tests on > >> > TeamCity, especially which fails because of a cluster from the > >> > previous test has not been stopped properly. > >> > > >> > Ci.tests [3] look a bit better than in master. > >> > Please review prepared PR [2] and share your thoughts. > >> > > >> > [1] https://issues.apache.org/jira/browse/IGNITE-5910 > >> > [2] https://github.com/apache/ignite/pull/2382 > >> > [3] https://ci.ignite.apache.org/viewLog.html?buildId=1105939 > >> > > >> > > >> > On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur > >> > <daradu...@gmail.com> wrote: > >> >> Hi Igniters, > >> >> > >> >> Working on my task I found a bug at call the method #stopGrid(name), > >> >> it produced ClassCastException. I created a ticket[1]. > >> >> > >> >> After it was fixed[2] I saw that nodes which was started in a > separate > >> >> JVM > >> >> could stay in process of operation system. > >> >> It was fixed too, but not sure is it fixed in proper way or not. > >> >> > >> >> Could someone review it? > >> >> > >> >> [1] https://issues.apache.org/jira/browse/IGNITE-5910 > >> >> [2] https://github.com/apache/ignite/pull/2382 > >> >> > >> >> -- > >> >> Best Regards, Vyacheslav D. > >> > > >> > > >> > > >> > -- > >> > Best Regards, Vyacheslav D. > >> > >> > >> > >> -- > >> Best Regards, Vyacheslav D. > > > > -- > Best Regards, Vyacheslav D. >