[
https://issues.apache.org/jira/browse/FLINK-10462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008630#comment-17008630
]
zhijiang commented on FLINK-10462:
----------------------------------
[~kevin.cyj] I think there are two requirements from your descriptions.
* One is how many connections are established between two TaskManagers.
Currently it is up to the ConnectionIndex generated by IntermediateResult. If
we want to adjust the number of connections by the number of netty threads by
default or configurable setting, we can also drop the concept of
ConnectionIndex from topology completely as this ticket proposed. Then it is
easy to be understood and would not bring unexpected regression I guess.
* Another is when to release the connection between two TaskManagers. Let's
talk about it via your created ticket FLINK-15455. I think there are two sides
for this proposal. For yarn per-job mode, the previous cancelled task may be
submitted to the original TaskManager, so it can make use of previous
connection to get benefits in this case. But for session mode, if one job
finishes and another job is submitted to the TaskManager again, the connection
between two TaskManagers might be changed for different jobs. E.g. jobA is
submitted to TM1 and TM2, and jobB is submitted to TM2 and TM3. In this case we
retain the connections between TM1 and TM2 after jobA finishes would waste
resources to some extent.
> Remove ConnectionIndex for further sharing tcp connection in credit-based
> mode
> -------------------------------------------------------------------------------
>
> Key: FLINK-10462
> URL: https://issues.apache.org/jira/browse/FLINK-10462
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0, 1.6.1
> Reporter: zhijiang
> Assignee: zhijiang
> Priority: Minor
>
> Every {{IntermediateResult}} generates a random {{ConnectionIndex}} which
> will be included in {{ConnectionID}}.
> The {{RemoteInputChannel}} requests to establish tcp connection via
> {{ConnectionID}}. That means one tcp connection may be shared by multiple
> {{RemoteInputChannel}} {{s which have the same ConnectionID}}. To do so, we
> can reduce the physical connections between two \{{TaskManager}} s, and it
> brings benefits for large scale jobs.
> But this sharing is limited only for the same {{IntermediateResult}}, and I
> think it is mainly because we may temporarily switch off {{autoread}} for the
> channel during back pressure in previous network flow control. For
> credit-based mode, the channel is always open for transporting different
> intermediate data, so we can further share the tcp connection for different
> {{IntermediateResults}} to remove the limit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)