Ivan Bessonov created IGNITE-20081:
--------------------------------------

             Summary: Implement "weakSend" properly, add "weakInvoke"
                 Key: IGNITE-20081
                 URL: https://issues.apache.org/jira/browse/IGNITE-20081
             Project: Ignite
          Issue Type: Improvement
            Reporter: Ivan Bessonov


There was an idea. Some components, like RAFT, are allowed to lose messages. 
Having strict guarantees for messages delivery may not be good for such 
components.

But, current implementation of "weakSend" is just a wrapper around "send" that 
doesn't return any future. This API must be redesigned and properly implemented.
h3. API
 * 
{{CompletableFuture<Void> weakSend(ClusterNode recipient, NetworkMessage msg, 
long timeout);}}
 * 
{{CompletableFuture<NetworkMessage> weakInvoke(ClusterNode recipient, 
NetworkMessage msg, long timeout);}}

Futures are being completed in two cases:
 * ack or response has been received
 * timeout is exceeded

This means that huge timeout is probably a bad idea for such messages.
h3. Implementation
 * with stable and fast connection, weak communication should work the same way 
from the client standpoint;
 * if a message queue for the given connection is full, we may/should:
 ** remove all weak messages from the existing queue, that 100% have not been 
sent;
 ** reject new weak messages;
 ** maybe throttle, but this is out of scope;
 * alternatively, if connection breaks, we may start removing weak messages 
from the queue, and/or rejecting new ones.

Weak send and weak invoke may behave differently.

For example, "weakSend" requires ack, so it has to be marked with a "message 
number" in recovery descriptor.
But, "weakInvoke" doesn't need an ack, it only requires a response (already has 
"correlationId"), so "not re-sending" it after reconnect shouldn't break the 
recovery protocol. It doesn't need to have a "message number" in a recovery 
descriptor, we can save some resources by reducing the number of acks.

One more important thing:
 * when invoke future fails with timeout exception, we must cleanup 
corresponding correlation ID from the map;
 * when we receive "node left" event for some node, we should complete all 
returned futures with some "NodeLeftException", and cleanup all its correlation 
IDs from the map as well.

h3. Integration

will be done separately. All we need, for now, is a set of unit tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to