[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546178#comment-16546178 ]
Denis Mekhanikov commented on IGNITE-8922: ------------------------------------------ [~dkarachentsev], losing discovery messages is much more harmful, since it may lead to the whole cluster being stuck. On the other hand, if a node fails with OOME, then it's only a problem of one node. And in order to make it happen, discard messages should not be delivered for a really long time, which is quite unlikely. So, I think, that nodes should either guarantee delivery of all discovery messages, that are passed to them, or die. > Discovery message delivery guarantee can be violated > ---------------------------------------------------- > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.5 > Reporter: Denis Mekhanikov > Assignee: Denis Mekhanikov > Priority: Critical > Fix For: 2.7 > > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)