Hi Igniters!

I have investigated the issue [1] and found that stopping node in
separate JVM may stuck thread or leave system process alive after test
The main reason is *StopGridTask* that we send from node in local JVM
to node in separate JVM via remote computing.
We send job synchronously to be sure that node will be stopped, but
job calls synchronously *G.stop(igniteInstanceName, cancel))* with
*cancel = false*, that means node must wait to compute jobs before it
goes down what leads to some kind of deadlock. Using of *cancel =
true* would solve the issue but may break some tests’ logic, for this
reason, I've reworked the method’s synchronization logic [2].

We have not noticed that before because we use only *stopAllGrids()*
in out tests which stop local JVM without waiting for nodes in other
I believe this fix should reduce the number of flaky tests on
TeamCity, especially which fails because of a cluster from the
previous test has not been stopped properly.

Ci.tests [3] look a bit better than in master.
Please review prepared PR [2] and share your thoughts.

[1] https://issues.apache.org/jira/browse/IGNITE-5910
[2] https://github.com/apache/ignite/pull/2382
[3] https://ci.ignite.apache.org/viewLog.html?buildId=1105939

On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur <daradu...@gmail.com> wrote:
> Hi Igniters,
> Working on my task I found a bug at call the method #stopGrid(name),
> it produced ClassCastException. I created a ticket[1].
> After it was fixed[2] I saw that nodes which was started in a separate JVM
> could stay in process of operation system.
> It was fixed too, but not sure is it fixed in proper way or not.
> Could someone review it?
> [1] https://issues.apache.org/jira/browse/IGNITE-5910
> [2] https://github.com/apache/ignite/pull/2382
> --
> Best Regards, Vyacheslav D.

Best Regards, Vyacheslav D.

Reply via email to