[
https://issues.apache.org/jira/browse/IGNITE-14085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536717#comment-17536717
]
Aleksandr Polovtcev commented on IGNITE-14085:
----------------------------------------------
Fix looks awesome, thank you
> Implement message recovery protocol over handshake
> --------------------------------------------------
>
> Key: IGNITE-14085
> URL: https://issues.apache.org/jira/browse/IGNITE-14085
> Project: Ignite
> Issue Type: Bug
> Reporter: Anton Kalashnikov
> Assignee: Semyon Danilov
> Priority: Major
> Labels: iep-66, ignite-3
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> First of all, we should introduce Communication Recovery Descriptor, a data
> structure that holds information about a specific connection between two
> nodes. It should hold the following data:
> * Connection id (because we may have multiple connections between two nodes)
> * Count of sent messages
> * Count of received messages
> * Count of acknowledgments received for sent messages
> * Count of acknowledgments sent for received messages
> * Queue of sent but not acknowledged messages
> Every connection must have a bound recovery descriptor so in case of the
> connectivity failure we can resend not-acknowledged messages.
> The process of handshake should be as follows:
> # Server receives incoming connection and sends its identity information
> (launch id, consistent id)
> # Client receives server information and sends its identity and recovery
> information (connection id, number of received messages)
> # Server receives client's recovery information and sends its own recovery
> information
> # Server sends all unacknowledged messages if any exists
> # Client sends all unacknowledged messages if any exists
> Connection should be considered ready for work after all the unacknowledged
> messages are sent and acknowledged.
> The process of sending and receiving a message should also change to this:
> * Every message we are going to send must first be added to the communication
> recovery descriptor's message queue and update the sent message counter.
> * After receiving a message we should send an acknowledgement (we could also
> send a batch acknowledgement, for example for every 5 received messages send
> 1 ack) and update the received messages counter and the sent acknowledgements
> counter.
> * After receiving an acknowledgement message we must remove the sent message
> from the CRD's queue and update the appropriate counter.
> Extra attention should be paid for the counter management as messages are not
> idempotent and handling same message twice can lead to an undefined behaviour.
> Some of the message should not be counted at all (thus shall not be
> acknowledged), for example: acknowledgement messages, handshakes, probably
> something else.
> It should also be noted that current messaging API has a public method for
> sending a message without a need for acknowledgement, this should be handled
> appropriately.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)