[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546178#comment-16546178
 ] 

Denis Mekhanikov commented on IGNITE-8922:
------------------------------------------

[~dkarachentsev], losing discovery messages is much more harmful, since it may 
lead to the whole cluster being stuck.

On the other hand, if a node fails with OOME, then it's only a problem of one 
node. And in order to make it happen, discard messages should not be delivered 
for a really long time, which is quite unlikely.

So, I think, that nodes should either guarantee delivery of all discovery 
messages, that are passed to them, or die.

> Discovery message delivery guarantee can be violated
> ----------------------------------------------------
>
>                 Key: IGNITE-8922
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8922
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.5
>            Reporter: Denis Mekhanikov
>            Assignee: Denis Mekhanikov
>            Priority: Critical
>             Fix For: 2.7
>
>         Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to