Semen Boikov created IGNITE-4111:
------------------------------------

             Summary: Communication fails to send message if target node did 
not finish join process
                 Key: IGNITE-4111
                 URL: https://issues.apache.org/jira/browse/IGNITE-4111
             Project: Ignite
          Issue Type: Bug
          Components: general
            Reporter: Semen Boikov
             Fix For: 1.8


Currently this scenario is possible:
- joining node sent join request and waits for 
TcpDiscoveryNodeAddFinishedMessage inside ServerImpl.joinTopology
- others nodes already see this node and can send messages to it (for example 
try to run compute job on this node)
- joining node can not receive message: TcpCommunicationSpi will hang inside 
'onFirstMessage' on 'getSpiContext' call, so sending node will get error trying 
to establish connection

Possible fix: if in onFirstMessage() spi context is not available, then 
TcpCommunicationSpi  should send special response which indicates that this 
node is not ready yet, and sender should retry after some time.

Also need check internal code for places where message can be unnecessarily 
sent to node: one such place is 
GridCachePartitionExchangeManager.refreshPartitions - message is sent to all 
known nodes, but here we can filter by node order / finished exchage version.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to