Hi,
We are deploying Flink in master-worker configuration but receiving the
following error when taskmanager tries to connect with jobmanager. The host
machine Jobmanager log shows that is up, running and successfully granted
leadership however, in the worker machine the Taskmanager is unable to
connect to the Jobmanager.

Any inputs on this?

Here is the log file contents for TaskManager:
-------------------------------------------------------------------------------------------------------------------


2016-04-17 23:02:00,147 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
--------------------------------------------------------------------------------
2016-04-17 23:02:00,148 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Starting
TaskManager (Version: 1.0.0, Rev:<unknown>, Date:<unknown>)
2016-04-17 23:02:00,150 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Current
user: root
2016-04-17 23:02:00,151 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JVM:
OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.95-b01
2016-04-17 23:02:00,151 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Maximum
heap size: 918 MiBytes
2016-04-17 23:02:00,151 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JAVA_HOME:
(not set)
2016-04-17 23:02:00,155 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Hadoop
version: 2.7.1
2016-04-17 23:02:00,156 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  JVM
Options:
2016-04-17 23:02:00,156 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
-XX:MaxPermSize=256m
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
-Dlog.file=/usr/lib/flink/log/flink-root-taskmanager-0-bigtop2.vagrant.log
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
-Dlog4j.configuration=file:/usr/lib/flink/conf/log4j.properties
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
-Dlogback.configurationFile=file:/usr/lib/flink/conf/logback.xml
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Program
Arguments:
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
--configDir
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
/etc/flink/conf
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -  Classpath:
/usr/lib/flink/lib/flink-dist_2.10-1.0.0.jar:/usr/lib/flink/lib/flink-python_2.10-1.0.0.jar:/usr/lib/flink/lib/log4j-1.2.17.jar:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar:::
2016-04-17 23:02:00,157 INFO
org.apache.flink.runtime.taskmanager.TaskManager              -
--------------------------------------------------------------------------------
2016-04-17 23:02:00,177 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Registered
UNIX signal handlers for [TERM, HUP, INT]
2016-04-17 23:02:00,188 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Maximum
number of open file descriptors is 4096
2016-04-17 23:02:00,264 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Loading
configuration from /etc/flink/conf
2016-04-17 23:02:00,471 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Security is
not enabled. Starting non-authenticated TaskManager.
2016-04-17 23:02:00,552 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to
select the network interface and address to use by connecting to the
leading JobManager.
2016-04-17 23:02:00,552 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager
will try to connect for 10000 milliseconds before falling back to heuristics
2016-04-17 23:02:00,555 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved
new target address /10.10.10.11:6123.
2016-04-17 23:02:01,749 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /10.10.10.11:6123
2016-04-17 23:02:01,769 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'bigtop2.vagrant/10.10.10.12': No route to host
2016-04-17 23:02:01,781 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': No route to host
2016-04-17 23:02:01,781 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:01,798 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': No route to host
2016-04-17 23:02:01,798 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:01,859 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:01,860 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:01,860 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:01,860 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:02,870 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:02,871 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:03,872 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:03,872 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:03,873 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:03,973 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /10.10.10.11:6123
2016-04-17 23:02:03,974 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'bigtop2.vagrant/10.10.10.12': No route to host
2016-04-17 23:02:04,025 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:04,025 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:04,076 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:04,076 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:04,127 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:04,128 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:04,128 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:04,128 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:05,130 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:05,130 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:06,131 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:06,132 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:06,132 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:06,332 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /10.10.10.11:6123
2016-04-17 23:02:06,333 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'bigtop2.vagrant/10.10.10.12': No route to host
2016-04-17 23:02:06,334 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': No route to host
2016-04-17 23:02:06,335 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:06,385 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:06,386 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:06,436 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:06,437 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:06,437 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:06,437 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:07,439 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:07,440 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:08,441 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:08,442 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:08,442 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:08,842 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /10.10.10.11:6123
2016-04-17 23:02:08,843 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'bigtop2.vagrant/10.10.10.12': No route to host
2016-04-17 23:02:08,894 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:08,895 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:08,946 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:08,946 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:08,997 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:08,997 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:08,997 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:08,998 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is
unreachable
2016-04-17 23:02:09,999 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.10.10.12': connect timed out
2016-04-17 23:02:10,000 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is
unreachable
2016-04-17 23:02:11,001 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/10.0.2.15': connect timed out
2016-04-17 23:02:11,001 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable
2016-04-17 23:02:11,002 INFO
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Invalid argument
2016-04-17 23:02:11,002 WARN
org.apache.flink.runtime.net.ConnectionUtils                  - Could not
connect to /10.10.10.11:6123. Selecting a local address using heuristics.
2016-04-17 23:02:11,003 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
will use hostname/address 'bigtop2.vagrant' (10.10.10.12) for communication.
2016-04-17 23:02:11,004 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager
2016-04-17 23:02:11,004 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor system at 10.10.10.12:6122
2016-04-17 23:02:11,570 INFO
akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger
started
2016-04-17 23:02:11,652 INFO
Remoting                                                      - Starting
remoting
2016-04-17 23:02:11,830 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor
2016-04-17 23:02:11,835 INFO
Remoting                                                      - Remoting
started; listening on addresses :[akka.tcp://[email protected]:6122]
2016-04-17 23:02:11,842 INFO
org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig
[server address: bigtop2.vagrant/10.10.10.12, server port: 51100, memory
segment size (bytes): 32768, transport type: NIO, number of server threads:
1 (manual), number of client threads: 1 (manual), server connect backlog: 0
(use Netty's default), client connect timeout (sec): 120, send/receive
buffer size (bytes): 0 (use Netty's default)]
2016-04-17 23:02:11,845 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Messages
between TaskManager and JobManager have a max timeout of 10000 milliseconds
2016-04-17 23:02:11,847 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Temporary
file directory '/tmp': total 18 GB, usable 16 GB (88.89% usable)
2016-04-17 23:02:11,999 INFO
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated
64 MB for network buffer pool (number of memory segments: 2048, bytes per
segment: 32768).
2016-04-17 23:02:12,036 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Limiting
managed memory to 0.7 of the currently free heap space (593 MB), memory
will be allocated lazily.
2016-04-17 23:02:12,046 INFO
org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager
uses directory /tmp/flink-io-28a3d591-2a02-4582-87e9-35a812c83b1c for spill
files.
2016-04-17 23:02:12,058 INFO
org.apache.flink.runtime.filecache.FileCache                  - User file
cache uses directory
/tmp/flink-dist-cache-3f0b2e74-86b5-4ef2-856f-37bca5ea18a1
2016-04-17 23:02:12,261 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Starting
TaskManager actor at akka://flink/user/taskmanager#1978479123.
2016-04-17 23:02:12,262 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
data connection information: bigtop2.vagrant (dataPort=51100)
2016-04-17 23:02:12,263 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager
has 1 task slot(s).
2016-04-17 23:02:12,264 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Memory
usage stats: [HEAP: 80/161/918 MB, NON HEAP: 26/28/304 MB
(used/committed/max)]
2016-04-17 23:02:12,268 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 1, timeout: 500 milliseconds)
2016-04-17 23:02:12,397 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].
2016-04-17 23:02:12,796 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 2, timeout: 1000 milliseconds)
2016-04-17 23:02:13,816 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 3, timeout: 2000 milliseconds)
2016-04-17 23:02:15,836 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 4, timeout: 4000 milliseconds)
2016-04-17 23:02:19,856 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 5, timeout: 8000 milliseconds)
2016-04-17 23:02:19,863 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].
2016-04-17 23:02:27,883 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 6, timeout: 16000 milliseconds)
2016-04-17 23:02:27,889 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].
2016-04-17 23:02:43,895 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 7, timeout: 30 seconds)
2016-04-17 23:02:50,918 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].
2016-04-17 23:03:13,915 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 8, timeout: 30 seconds)
2016-04-17 23:03:13,921 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].
2016-04-17 23:03:43,936 INFO
org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
register at JobManager akka.tcp://[email protected]:6123/user/jobmanager
(attempt 9, timeout: 30 seconds)
2016-04-17 23:03:43,942 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://[email protected]:6123] has failed, address
is now gated for [5000] ms. Reason is: [Association failed with
[akka.tcp://[email protected]:6123]].

-- 
Regards,
Harshita Agrawal

Reply via email to