[
https://issues.apache.org/jira/browse/FLINK-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434680#comment-15434680
]
Zhijiang Wang commented on FLINK-4424:
--------------------------------------
The previous way of {{NetworkEnvironment}} is like lazy start. Whether to start
the {{ConnectionManager}} relies on establishing RPC communication between
{{TaskManager}} and {{JobManager}} because the connection info of upstream is
notified to downstream by {{JobMaster}}. Actually we can pre-start the
{{ConnectionManager}} after construct the {{NetworkEnvironment}} component to
make the registration process more cleaner, not mixed with starting
{{ConnectionManager}}.
@Till, I can first extract the start related process of {{ConnectionManager}}
and put in constructor of {{NetworkEnvironment}}. And the shutdown process may
rely on the new TaskExecutor progress, and it may be fixed later. Do you think
so?
> Make network environment start-up/shutdown independent of JobManager
> association
> --------------------------------------------------------------------------------
>
> Key: FLINK-4424
> URL: https://issues.apache.org/jira/browse/FLINK-4424
> Project: Flink
> Issue Type: Improvement
> Components: Network, TaskManager
> Reporter: Till Rohrmann
>
> Currently, the {{TaskManager}} starts the netty network server only after it
> has registered with a {{JobManager}}. Upon loss of connection to the
> {{JobManager}} the {{NetworkEnvironment}} is closed.
> The start-up and shutdown of the network server should be independent of the
> {{JobManager}} connection, especially if we assume that a TM can be
> associated with multiple JobManagers in the future (FLIP-6).
> Starting the network server only once when the {{TaskManager}} is started has
> the advantage that we don't have to preconfigure the {{TaskManager's}} data
> port. Furthermore we don't risk to get stuck when disassociating from a
> {{JobManager}} because the start-up and shutdown of a {{NetworkEnvironment}}
> can cause problems (because it has to reserve/free resources).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)