[
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen updated FLINK-1608:
--------------------------------
Description:
The TaskManager uses a NetUtils routine to pick a network interface that lets
it talk to the Jobmanager. However, if the JobManager is not online yet, the
TaskManager falls back to an arbitrary non-localhost device.
In cases where the TaskManagers start faster than the JobManager, they may pick
the wrong interface (and associated address and hostname)
The later logic (that tries to connect to the JobManager actor) does several
retries. I think we need similar logic when looking for a suitable network
interface to use.
was:
The taskmanagers use a NetUtils routine to find an interface that lets them
talk to the Jobmanager. However, if the JobManager is not online yet, they fall
back to some non-localhost device.
In cases where the TaskManagers start faster than the JobManager, they pick the
wrong hostname and interface.
The later logic (that tries to connect to the JobManager actor) has a logic
with retries. I think we need a similar logic here...
> TaskManagers may pick wrong network interface when starting before JobManager
> -----------------------------------------------------------------------------
>
> Key: FLINK-1608
> URL: https://issues.apache.org/jira/browse/FLINK-1608
> Project: Flink
> Issue Type: Bug
> Components: TaskManager
> Affects Versions: 0.9
> Reporter: Stephan Ewen
> Fix For: 0.9
>
>
> The TaskManager uses a NetUtils routine to pick a network interface that lets
> it talk to the Jobmanager. However, if the JobManager is not online yet, the
> TaskManager falls back to an arbitrary non-localhost device.
> In cases where the TaskManagers start faster than the JobManager, they may
> pick the wrong interface (and associated address and hostname)
> The later logic (that tries to connect to the JobManager actor) does several
> retries. I think we need similar logic when looking for a suitable network
> interface to use.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)