[ 
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1608:
--------------------------------
    Description: 
The taskmanagers use a NetUtils routine to find an interface that lets them 
talk to the Jobmanager. However, if the JobManager is not online yet, they fall 
back to some non-localhost device.

In cases where the TaskManagers start faster than the JobManager, they pick the 
wrong hostname and interface.

The later logic (that tries to connect to the JobManager actor) has a logic 
with retries. I think we need a similar logic here...

  was:
The taskmanagers use a NetUtils routine to find an interface that lets them 
talk to the Jobmanager. However, if the JobManager is not online yet, they fall 
back to localhost.

In cases where the TaskManagers start faster than the JobManager, they pick the 
wrong hostname and interface.


> TaskManagers may pick wrong network interface when starting before JobManager
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1608
>                 URL: https://issues.apache.org/jira/browse/FLINK-1608
>             Project: Flink
>          Issue Type: Bug
>          Components: TaskManager
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>             Fix For: 0.9
>
>
> The taskmanagers use a NetUtils routine to find an interface that lets them 
> talk to the Jobmanager. However, if the JobManager is not online yet, they 
> fall back to some non-localhost device.
> In cases where the TaskManagers start faster than the JobManager, they pick 
> the wrong hostname and interface.
> The later logic (that tries to connect to the JobManager actor) has a logic 
> with retries. I think we need a similar logic here...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to