[ 
https://issues.apache.org/jira/browse/FLINK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008556#comment-15008556
 ] 

ASF GitHub Bot commented on FLINK-2967:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1361#discussion_r45052205
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/ConnectionUtils.java 
---
    @@ -180,7 +180,41 @@ public static InetAddress 
findConnectingAddress(InetSocketAddress targetAddress,
                }
        }
     
    +   /**
    +    * This utility method tries to connect to the JobManager using the 
InetAddress returned by
    +    * InetAddress.getLocalHost(). The purpose of the utility is to have a 
final try connecting to
    +    * the target address using the LocalHost before using the address 
returned.
    +    * We do a second try because the JM might have been unavailable during 
the first check.
    +    *
    +    * @param preliminaryResult The address detected by the heuristic
    +    * @return either the preliminaryResult or the address returned by 
InetAddress.getLocalHost() (if
    +    *                      we are able to connect to targetAddress from 
there)
    +    */
    +   private static InetAddress tryLocalHostBeforeReturning(InetAddress 
preliminaryResult, SocketAddress targetAddress, boolean logging) throws 
IOException {
    +           InetAddress localhostName = InetAddress.getLocalHost();
    +           if(tryToConnect(localhostName, targetAddress, 
AddressDetectionState.LOCAL_HOST.getTimeout(), logging)) {
    --- End diff --
    
    I think we can use a bit higher timeout here. 200ms is probably mostly 
enough, but why make it so short here? If it succeeds fast, it does not add 
delay, and it it does not succeed we went through the other interfaces already 
anyways and better spend another second to make sure we get it right.


> TM address detection might not always detect the right interface on slow 
> networks / overloaded JMs
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2967
>                 URL: https://issues.apache.org/jira/browse/FLINK-2967
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 0.9, 0.10.0, 1.0.0
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>
> I'm talking to a user which is facing the following issue:
> Some of the TaskManagers select the wrong IP address out of the available 
> network interfaces.
> The first address we try to connect to is the one returned by 
> {{InetAddress.getLocalHost()}}. This address is the right IP address to use, 
> but the JobManager is not able to respond within the timeout (50ms) to that 
> connection request.
> So the TM tries the next address, which is not publicly reachable. However, 
> the TM can connect to the JM from there. Netty will later fail to connect to 
> the TM from the other TMs.
> There are two solutions for this issue:
> - Allow users to configure a higher timeout for the first address detection 
> strategy. In most cases, the address returned by 
> {{InetAddress.getLocalHost()}} is correct. By setting a high timeout, users 
> with slow networks / overloaded JMs can make sure the TM picks this address
> - add an Akka message which we send from the TM to the JM, and the JM tries 
> to connect to the TM. If that succeeds, we know that the TM is reachable from 
> the outside.
> The problem is that we have to start a separate actor system on the 
> TaskManager first. We have to do this because might use a wrong ip address 
> for the TM (so we might end up starting actor systems until we found an 
> externally reachable ip)
> I'm first going to implement the first approach. If that solution works well 
> for my user, I'll contribute this to 0.10 / 1.0.
> If not, I'll implement the second approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to