Ivan Bessonov created IGNITE-20081:
--------------------------------------
Summary: Implement "weakSend" properly, add "weakInvoke"
Key: IGNITE-20081
URL: https://issues.apache.org/jira/browse/IGNITE-20081
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov
There was an idea. Some components, like RAFT, are allowed to lose messages.
Having strict guarantees for messages delivery may not be good for such
components.
But, current implementation of "weakSend" is just a wrapper around "send" that
doesn't return any future. This API must be redesigned and properly implemented.
h3. API
*
{{CompletableFuture<Void> weakSend(ClusterNode recipient, NetworkMessage msg,
long timeout);}}
*
{{CompletableFuture<NetworkMessage> weakInvoke(ClusterNode recipient,
NetworkMessage msg, long timeout);}}
Futures are being completed in two cases:
* ack or response has been received
* timeout is exceeded
This means that huge timeout is probably a bad idea for such messages.
h3. Implementation
* with stable and fast connection, weak communication should work the same way
from the client standpoint;
* if a message queue for the given connection is full, we may/should:
** remove all weak messages from the existing queue, that 100% have not been
sent;
** reject new weak messages;
** maybe throttle, but this is out of scope;
* alternatively, if connection breaks, we may start removing weak messages
from the queue, and/or rejecting new ones.
Weak send and weak invoke may behave differently.
For example, "weakSend" requires ack, so it has to be marked with a "message
number" in recovery descriptor.
But, "weakInvoke" doesn't need an ack, it only requires a response (already has
"correlationId"), so "not re-sending" it after reconnect shouldn't break the
recovery protocol. It doesn't need to have a "message number" in a recovery
descriptor, we can save some resources by reducing the number of acks.
One more important thing:
* when invoke future fails with timeout exception, we must cleanup
corresponding correlation ID from the map;
* when we receive "node left" event for some node, we should complete all
returned futures with some "NodeLeftException", and cleanup all its correlation
IDs from the map as well.
h3. Integration
will be done separately. All we need, for now, is a set of unit tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)