[
https://issues.apache.org/jira/browse/FLINK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902150#comment-14902150
]
ASF GitHub Bot commented on FLINK-2722:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/1159#discussion_r40058785
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/net/NetUtils.java ---
@@ -189,9 +191,17 @@ public static InetAddress
findConnectingAddress(InetSocketAddress targetAddress,
long currentSleepTime = MIN_SLEEP_TIME;
long elapsedTime = 0;
+ // before trying with different strategies: test with
getLocalHost():
+ InetAddress localhostName = InetAddress.getLocalHost();
+
+ if(tryToConnect(localhostName, targetAddress,
AddressDetectionState.ADDRESS.getTimeout(), false)) {
+ LOG.debug("Using immediately InetAddress.getLocalHost()
for the connecting address");
--- End diff --
These are the produced log statements in `DEBUG` level:
```
16:12:19,822 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils
- Trying to select the network interface and address to use by connecting
to the leading JobManager.
16:12:19,822 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils
- TaskManager will try to connect for 10000 milliseconds before falling
back to heuristics
16:12:19,833 INFO org.apache.flink.runtime.net.NetUtils
- Retrieved new target address /10.240.221.7:33378.
16:12:19,835 DEBUG org.apache.flink.runtime.net.NetUtils
- Trying to connect to (/10.240.221.7:33378) from local address
cdh544-master.c.astral-sorter-757.internal/10.240.242.143 with timeout 50
16:12:19,838 DEBUG org.apache.flink.runtime.net.NetUtils
- Using immediately InetAddress.getLocalHost() for the connecting address
16:12:19,839 INFO org.apache.flink.runtime.taskmanager.TaskManager
- TaskManager will use hostname/address
'cdh544-master.c.astral-sorter-757.internal' (10.240.242.143) for communication.
16:12:19,839 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager in streaming mode BATCH_ONLY
16:12:19,839 INFO org.apache.flink.runtime.taskmanager.TaskManager
- Starting TaskManager actor system at
cdh544-master.c.astral-sorter-757.internal:0
```
I think the messages in `INFO` level contain enough information for users
to understand whats going on.
> Use InetAddress.getLocalHost() first when detecting TaskManager IP address
> --------------------------------------------------------------------------
>
> Key: FLINK-2722
> URL: https://issues.apache.org/jira/browse/FLINK-2722
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime, TaskManager
> Affects Versions: 0.9, 0.10
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Fix For: 0.9.2
>
>
> A user reported a connection issue with Netty being unable to connect to a
> TaskManager to subscribe to an intermediate result.
> The problem occurred when the TaskManager and JobManager were running on the
> same host (something that can easily happen on YARN).
> In that case, the TaskManager was reporting a host-local ip address to the
> JobManager when connecting.
> To avoid the issue in the future, the TaskManager first tries to use the
> hostname returned by InetAddress.getLocalHost(). In a properly set-up
> environment, this will return a connection which is accessible by all
> machines in a cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)