Yakov Zhdanov created IGNITE-5056:
-------------------------------------
Summary: Implement communication backpressure control
Key: IGNITE-5056
URL: https://issues.apache.org/jira/browse/IGNITE-5056
Project: Ignite
Issue Type: Improvement
Reporter: Yakov Zhdanov
Assignee: Yakov Zhdanov
Priority: Critical
Fix For: 2.1
Problem
Currently backpressure control relies on semaphore on sending side that ensures
that sending queue cannot be overflown and a special counter on receiving side
that stops reading from the socket when unprocessed message count outgrows
limit config parameter.
In some scenarios it may lead to a distributed deadlock. E.g. we send many
async jobs to remote nodes which in turn do sync cache operations. If task
master node is a backup or primary for some cache updates and has already
scheduled too many job requests for send it will not be able to respond to
cache requests thus remote jobs would never complete.
Solution
Reading from socket should never stop
Design notes
* add IgniteConfiguration.maxAsyncRequests and propagate it via node attributes
to all nodes of the cluster. All nodes may have different value (however this
is unlikely).
* add a flag to GridIoMessage.async. If flag is false then sender node assumed
to synchronously wait for response and does not wait otherwise.
* all sent async messages should be tracked on sender node on per-receiver
basis.
* all received async messages should be tracked on receiver nodes
* nodes should add flag to communication acks on whether they can more async
messages or not
* sender should never exceed IgniteConfiguration.maxAsyncRequests async
requests per node
* if IgniteConfiguration.maxAsyncRequests is exceeded or node sets flag in
communication ack then all async messages become sync
* above means:
** next compute job from the task is sent to the node only after response for
some previous comes
** next dht update request (for primary sync or full async) is sent, but node
doesn't send response to near node unless it has not received response for
former operation from remote backup or for this operation
** next cache operation becomes sync - we force user code to wait on operation
future.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)