[
https://issues.apache.org/jira/browse/IGNITE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Scherbakov updated IGNITE-20081:
---------------------------------------
Labels: ignite-3 ignite3_performance (was: ignite-3)
> Implement "weakSend" properly, add "weakInvoke"
> -----------------------------------------------
>
> Key: IGNITE-20081
> URL: https://issues.apache.org/jira/browse/IGNITE-20081
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Priority: Major
> Labels: ignite-3, ignite3_performance
>
> There was an idea. Some components, like RAFT, are allowed to lose messages.
> Having strict guarantees for messages delivery may not be good for such
> components.
> But, current implementation of "weakSend" is just a wrapper around "send"
> that doesn't return any future. This API must be redesigned and properly
> implemented.
> h3. API
> *
> {{CompletableFuture<Void> weakSend(ClusterNode recipient, NetworkMessage msg,
> long timeout);}}
> *
> {{CompletableFuture<NetworkMessage> weakInvoke(ClusterNode recipient,
> NetworkMessage msg, long timeout);}}
> Futures are being completed in two cases:
> * ack or response has been received
> * timeout is exceeded
> This means that huge timeout is probably a bad idea for such messages.
> h3. Implementation
> * with stable and fast connection, weak communication should work the same
> way from the client standpoint;
> * if a message queue for the given connection is full, we may/should:
> ** remove all weak messages from the existing queue, that 100% have not been
> sent;
> ** reject new weak messages;
> ** maybe throttle, but this is out of scope;
> * alternatively, if connection breaks, we may start removing weak messages
> from the queue, and/or rejecting new ones.
> Weak send and weak invoke may behave differently.
> For example, "weakSend" requires ack, so it has to be marked with a "message
> number" in recovery descriptor.
> But, "weakInvoke" doesn't need an ack, it only requires a response (already
> has "correlationId"), so "not re-sending" it after reconnect shouldn't break
> the recovery protocol. It doesn't need to have a "message number" in a
> recovery descriptor, we can save some resources by reducing the number of
> acks.
> One more important thing:
> * when invoke future fails with timeout exception, we must cleanup
> corresponding correlation ID from the map;
> * when we receive "node left" event for some node, we should complete all
> returned futures with some "NodeLeftException", and cleanup all its
> correlation IDs from the map as well.
> h3. Integration
> will be done separately. All we need, for now, is a set of unit tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)