Ufuk Celebi created FLINK-1093:
----------------------------------
Summary: Add non-multiplexing NetworkConnectionManager
Key: FLINK-1093
URL: https://issues.apache.org/jira/browse/FLINK-1093
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Affects Versions: 0.7-incubating
Reporter: Ufuk Celebi
Priority: Minor
When translating a JobGraph to an ExecutionGraph, edges are assigned connection
indexes to ensure deadlock freedom of the network stack. Deadlocks can easily
occur when multiplexing multiple logical channels over a single TCP connection,
because of the limited network buffers. Each incoming envelope, which has an
attached buffer, needs to request a network buffer from the receiver's input
buffer pool. If the receiver does not have a buffer available, it unsubscribes
from the TCP connection until a new buffer is available. This unsubscribe then
affects all other logical channels, which are multiplexed over the same TCP
connection. With certain user programs this can result in a deadlock.
One of the problems when running in multiplexed mode is the question of when to
close TCP connections in a reliable way without loosing or re-ordering network
envelopes (as evident in FLINK-1063). In non-multiplexed mode this is trivial,
because every logical channel is mapped to a phyiscal channel and the close of
the logical channel translates to a close of the physical channel.
However, this approach results in limited scalability as high degrees of
parallelism result in a high number of TCP connections per machine, which are
continuously opened and closed. Still, it is useful as a baseline and should
work fine with smaller setups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)