Hi, We are deploying Flink in master-worker configuration but receiving the following error when taskmanager tries to connect with jobmanager. The host machine Jobmanager log shows that is up, running and successfully granted leadership however, in the worker machine the Taskmanager is unable to connect to the Jobmanager.
Any inputs on this? Here is the log file contents for TaskManager: ------------------------------------------------------------------------------------------------------------------- 2016-04-17 23:02:00,147 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2016-04-17 23:02:00,148 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.0.0, Rev:<unknown>, Date:<unknown>) 2016-04-17 23:02:00,150 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: root 2016-04-17 23:02:00,151 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.95-b01 2016-04-17 23:02:00,151 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 918 MiBytes 2016-04-17 23:02:00,151 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: (not set) 2016-04-17 23:02:00,155 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.1 2016-04-17 23:02:00,156 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 2016-04-17 23:02:00,156 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxPermSize=256m 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog.file=/usr/lib/flink/log/flink-root-taskmanager-0-bigtop2.vagrant.log 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/usr/lib/flink/conf/log4j.properties 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/usr/lib/flink/conf/logback.xml 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments: 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - /etc/flink/conf 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /usr/lib/flink/lib/flink-dist_2.10-1.0.0.jar:/usr/lib/flink/lib/flink-python_2.10-1.0.0.jar:/usr/lib/flink/lib/log4j-1.2.17.jar:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar::: 2016-04-17 23:02:00,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2016-04-17 23:02:00,177 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2016-04-17 23:02:00,188 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 4096 2016-04-17 23:02:00,264 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /etc/flink/conf 2016-04-17 23:02:00,471 INFO org.apache.flink.runtime.taskmanager.TaskManager - Security is not enabled. Starting non-authenticated TaskManager. 2016-04-17 23:02:00,552 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager. 2016-04-17 23:02:00,552 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics 2016-04-17 23:02:00,555 INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address /10.10.10.11:6123. 2016-04-17 23:02:01,749 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /10.10.10.11:6123 2016-04-17 23:02:01,769 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'bigtop2.vagrant/10.10.10.12': No route to host 2016-04-17 23:02:01,781 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': No route to host 2016-04-17 23:02:01,781 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:01,798 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': No route to host 2016-04-17 23:02:01,798 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:01,859 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:01,860 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:01,860 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:01,860 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:02,870 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:02,871 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:03,872 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:03,872 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:03,873 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:03,973 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /10.10.10.11:6123 2016-04-17 23:02:03,974 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'bigtop2.vagrant/10.10.10.12': No route to host 2016-04-17 23:02:04,025 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:04,025 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:04,076 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:04,076 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:04,127 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:04,128 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:04,128 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:04,128 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:05,130 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:05,130 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:06,131 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:06,132 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:06,132 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:06,332 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /10.10.10.11:6123 2016-04-17 23:02:06,333 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'bigtop2.vagrant/10.10.10.12': No route to host 2016-04-17 23:02:06,334 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': No route to host 2016-04-17 23:02:06,335 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:06,385 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:06,386 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:06,436 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:06,437 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:06,437 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:06,437 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:07,439 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:07,440 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:08,441 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:08,442 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:08,442 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:08,842 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /10.10.10.11:6123 2016-04-17 23:02:08,843 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'bigtop2.vagrant/10.10.10.12': No route to host 2016-04-17 23:02:08,894 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:08,895 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:08,946 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:08,946 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:08,997 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:08,997 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:08,997 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:08,998 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is unreachable 2016-04-17 23:02:09,999 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.10.10.12': connect timed out 2016-04-17 23:02:10,000 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is unreachable 2016-04-17 23:02:11,001 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/10.0.2.15': connect timed out 2016-04-17 23:02:11,001 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable 2016-04-17 23:02:11,002 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument 2016-04-17 23:02:11,002 WARN org.apache.flink.runtime.net.ConnectionUtils - Could not connect to /10.10.10.11:6123. Selecting a local address using heuristics. 2016-04-17 23:02:11,003 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager will use hostname/address 'bigtop2.vagrant' (10.10.10.12) for communication. 2016-04-17 23:02:11,004 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager 2016-04-17 23:02:11,004 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor system at 10.10.10.12:6122 2016-04-17 23:02:11,570 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2016-04-17 23:02:11,652 INFO Remoting - Starting remoting 2016-04-17 23:02:11,830 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor 2016-04-17 23:02:11,835 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://[email protected]:6122] 2016-04-17 23:02:11,842 INFO org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig [server address: bigtop2.vagrant/10.10.10.12, server port: 51100, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] 2016-04-17 23:02:11,845 INFO org.apache.flink.runtime.taskmanager.TaskManager - Messages between TaskManager and JobManager have a max timeout of 10000 milliseconds 2016-04-17 23:02:11,847 INFO org.apache.flink.runtime.taskmanager.TaskManager - Temporary file directory '/tmp': total 18 GB, usable 16 GB (88.89% usable) 2016-04-17 23:02:11,999 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768). 2016-04-17 23:02:12,036 INFO org.apache.flink.runtime.taskmanager.TaskManager - Limiting managed memory to 0.7 of the currently free heap space (593 MB), memory will be allocated lazily. 2016-04-17 23:02:12,046 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-28a3d591-2a02-4582-87e9-35a812c83b1c for spill files. 2016-04-17 23:02:12,058 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-3f0b2e74-86b5-4ef2-856f-37bca5ea18a1 2016-04-17 23:02:12,261 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor at akka://flink/user/taskmanager#1978479123. 2016-04-17 23:02:12,262 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager data connection information: bigtop2.vagrant (dataPort=51100) 2016-04-17 23:02:12,263 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager has 1 task slot(s). 2016-04-17 23:02:12,264 INFO org.apache.flink.runtime.taskmanager.TaskManager - Memory usage stats: [HEAP: 80/161/918 MB, NON HEAP: 26/28/304 MB (used/committed/max)] 2016-04-17 23:02:12,268 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds) 2016-04-17 23:02:12,397 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. 2016-04-17 23:02:12,796 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds) 2016-04-17 23:02:13,816 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds) 2016-04-17 23:02:15,836 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 4, timeout: 4000 milliseconds) 2016-04-17 23:02:19,856 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 5, timeout: 8000 milliseconds) 2016-04-17 23:02:19,863 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. 2016-04-17 23:02:27,883 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 6, timeout: 16000 milliseconds) 2016-04-17 23:02:27,889 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. 2016-04-17 23:02:43,895 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 7, timeout: 30 seconds) 2016-04-17 23:02:50,918 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. 2016-04-17 23:03:13,915 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 8, timeout: 30 seconds) 2016-04-17 23:03:13,921 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. 2016-04-17 23:03:43,936 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://[email protected]:6123/user/jobmanager (attempt 9, timeout: 30 seconds) 2016-04-17 23:03:43,942 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[email protected]:6123] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://[email protected]:6123]]. -- Regards, Harshita Agrawal
