Yakov, thank you for the advice.

The thread.sleep is not enough, but some latch + future give me a way to the
reproducer.

I have created PR [1] into my master, for showing a test and modification of
ServerImpl which help me to slow down execution inside a danger section.

A code of test a bit long, but basically it about two parts:

In the first part, I randomly start and stop nodes to get a moment when
a server is starting to execute the dangerous code which I described in the
first message.

In the second part, I'm waiting while the first part produces this situation
and after that, I call public method of ServerImpl which fails with an
exception:

java.lang.AssertionError: Invalid node order: TcpDiscoveryNode
[id=f6bf048d-378b-4960-94cb-84e3d3300002, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47502], discPort=47502, order=0, intOrder=2,
lastExchangeTime=1524836605995, loc=false,
ver=2.5.0#20180426-sha1:34e22396, isClient=false]
    at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52)
    at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49)
    at
org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2014)
    at
org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9679)
    at
org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9652)
    at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590)
    at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleRemoteNodes(TcpDiscoveryNodesRing.java:164)
    at
org.apache.ignite.spi.discovery.tcp.ServerImpl.getRemoteNodes(ServerImpl.java:304)

As I told in the first message the problem arises because of the current
code
changes local node internal order and breaks sorting in
TcpDiscoveryNodesRing.nodes collection.

Is this reproducer convince enough?

[1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files



2018-02-13 20:17 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
> anywhere you like to show the incorrect behavior you describe.
>
> --Yakov
>

Reply via email to