Seems like you have some network communication issue on your private docker network. Or, the daemons aren't configured to accept connection from anything but localhost? I might be barking on a wrong tree, but that's an idea that I got from glancing at the error log.
Cos On Sun, Apr 17, 2016 at 07:27PM, Harshita Agrawal wrote: > Hi, > We are deploying Flink in master-worker configuration but receiving the > following error when taskmanager tries to connect with jobmanager. The host > machine Jobmanager log shows that is up, running and successfully granted > leadership however, in the worker machine the Taskmanager is unable to > connect to the Jobmanager. > > Any inputs on this? > > Here is the log file contents for TaskManager: > ------------------------------------------------------------------------------------------------------------------- > > > 2016-04-17 23:02:00,147 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -------------------------------------------------------------------------------- > 2016-04-17 23:02:00,148 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Starting > TaskManager (Version: 1.0.0, Rev:<unknown>, Date:<unknown>) > 2016-04-17 23:02:00,150 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Current > user: root > 2016-04-17 23:02:00,151 INFO > org.apache.flink.runtime.taskmanager.TaskManager - JVM: > OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.95-b01 > 2016-04-17 23:02:00,151 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Maximum > heap size: 918 MiBytes > 2016-04-17 23:02:00,151 INFO > org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: > (not set) > 2016-04-17 23:02:00,155 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Hadoop > version: 2.7.1 > 2016-04-17 23:02:00,156 INFO > org.apache.flink.runtime.taskmanager.TaskManager - JVM > Options: > 2016-04-17 23:02:00,156 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -XX:MaxPermSize=256m > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -Dlog.file=/usr/lib/flink/log/flink-root-taskmanager-0-bigtop2.vagrant.log > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -Dlog4j.configuration=file:/usr/lib/flink/conf/log4j.properties > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -Dlogback.configurationFile=file:/usr/lib/flink/conf/logback.xml > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Program > Arguments: > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > --configDir > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > /etc/flink/conf > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Classpath: > /usr/lib/flink/lib/flink-dist_2.10-1.0.0.jar:/usr/lib/flink/lib/flink-python_2.10-1.0.0.jar:/usr/lib/flink/lib/log4j-1.2.17.jar:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar::: > 2016-04-17 23:02:00,157 INFO > org.apache.flink.runtime.taskmanager.TaskManager - > -------------------------------------------------------------------------------- > 2016-04-17 23:02:00,177 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Registered > UNIX signal handlers for [TERM, HUP, INT] > 2016-04-17 23:02:00,188 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Maximum > number of open file descriptors is 4096 > 2016-04-17 23:02:00,264 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Loading > configuration from /etc/flink/conf > 2016-04-17 23:02:00,471 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Security is > not enabled. Starting non-authenticated TaskManager. > 2016-04-17 23:02:00,552 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to > select the network interface and address to use by connecting to the > leading JobManager. > 2016-04-17 23:02:00,552 INFO > org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager > will try to connect for 10000 milliseconds before falling back to heuristics > 2016-04-17 23:02:00,555 INFO > org.apache.flink.runtime.net.ConnectionUtils - Retrieved > new target address /10.10.10.11:6123. > 2016-04-17 23:02:01,749 INFO > org.apache.flink.runtime.net.ConnectionUtils - Trying to > connect to address /10.10.10.11:6123 > 2016-04-17 23:02:01,769 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address 'bigtop2.vagrant/10.10.10.12': No route to host > 2016-04-17 23:02:01,781 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': No route to host > 2016-04-17 23:02:01,781 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:01,798 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': No route to host > 2016-04-17 23:02:01,798 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:01,859 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:01,860 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:01,860 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:01,860 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:02,870 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:02,871 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:03,872 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:03,872 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:03,873 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:03,973 INFO > org.apache.flink.runtime.net.ConnectionUtils - Trying to > connect to address /10.10.10.11:6123 > 2016-04-17 23:02:03,974 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address 'bigtop2.vagrant/10.10.10.12': No route to host > 2016-04-17 23:02:04,025 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:04,025 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:04,076 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:04,076 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:04,127 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:04,128 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:04,128 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:04,128 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:05,130 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:05,130 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:06,131 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:06,132 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:06,132 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:06,332 INFO > org.apache.flink.runtime.net.ConnectionUtils - Trying to > connect to address /10.10.10.11:6123 > 2016-04-17 23:02:06,333 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address 'bigtop2.vagrant/10.10.10.12': No route to host > 2016-04-17 23:02:06,334 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': No route to host > 2016-04-17 23:02:06,335 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:06,385 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:06,386 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:06,436 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:06,437 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:06,437 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:06,437 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:07,439 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:07,440 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:08,441 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:08,442 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:08,442 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:08,842 INFO > org.apache.flink.runtime.net.ConnectionUtils - Trying to > connect to address /10.10.10.11:6123 > 2016-04-17 23:02:08,843 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address 'bigtop2.vagrant/10.10.10.12': No route to host > 2016-04-17 23:02:08,894 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:08,895 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:08,946 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:08,946 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:08,997 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:08,997 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:08,997 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:08,998 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe6c:8d4c%3': Network is > unreachable > 2016-04-17 23:02:09,999 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.10.10.12': connect timed out > 2016-04-17 23:02:10,000 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/fe80:0:0:0:a00:27ff:fe39:183c%2': Network is > unreachable > 2016-04-17 23:02:11,001 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/10.0.2.15': connect timed out > 2016-04-17 23:02:11,001 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/0:0:0:0:0:0:0:1%1': Network is unreachable > 2016-04-17 23:02:11,002 INFO > org.apache.flink.runtime.net.ConnectionUtils - Failed to > connect from address '/127.0.0.1': Invalid argument > 2016-04-17 23:02:11,002 WARN > org.apache.flink.runtime.net.ConnectionUtils - Could not > connect to /10.10.10.11:6123. Selecting a local address using heuristics. > 2016-04-17 23:02:11,003 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > will use hostname/address 'bigtop2.vagrant' (10.10.10.12) for communication. > 2016-04-17 23:02:11,004 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Starting > TaskManager > 2016-04-17 23:02:11,004 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Starting > TaskManager actor system at 10.10.10.12:6122 > 2016-04-17 23:02:11,570 INFO > akka.event.slf4j.Slf4jLogger - Slf4jLogger > started > 2016-04-17 23:02:11,652 INFO > Remoting - Starting > remoting > 2016-04-17 23:02:11,830 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Starting > TaskManager actor > 2016-04-17 23:02:11,835 INFO > Remoting - Remoting > started; listening on addresses :[akka.tcp://[email protected]:6122] > 2016-04-17 23:02:11,842 INFO > org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig > [server address: bigtop2.vagrant/10.10.10.12, server port: 51100, memory > segment size (bytes): 32768, transport type: NIO, number of server threads: > 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 > (use Netty's default), client connect timeout (sec): 120, send/receive > buffer size (bytes): 0 (use Netty's default)] > 2016-04-17 23:02:11,845 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Messages > between TaskManager and JobManager have a max timeout of 10000 milliseconds > 2016-04-17 23:02:11,847 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Temporary > file directory '/tmp': total 18 GB, usable 16 GB (88.89% usable) > 2016-04-17 23:02:11,999 INFO > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated > 64 MB for network buffer pool (number of memory segments: 2048, bytes per > segment: 32768). > 2016-04-17 23:02:12,036 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Limiting > managed memory to 0.7 of the currently free heap space (593 MB), memory > will be allocated lazily. > 2016-04-17 23:02:12,046 INFO > org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager > uses directory /tmp/flink-io-28a3d591-2a02-4582-87e9-35a812c83b1c for spill > files. > 2016-04-17 23:02:12,058 INFO > org.apache.flink.runtime.filecache.FileCache - User file > cache uses directory > /tmp/flink-dist-cache-3f0b2e74-86b5-4ef2-856f-37bca5ea18a1 > 2016-04-17 23:02:12,261 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Starting > TaskManager actor at akka://flink/user/taskmanager#1978479123. > 2016-04-17 23:02:12,262 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > data connection information: bigtop2.vagrant (dataPort=51100) > 2016-04-17 23:02:12,263 INFO > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > has 1 task slot(s). > 2016-04-17 23:02:12,264 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Memory > usage stats: [HEAP: 80/161/918 MB, NON HEAP: 26/28/304 MB > (used/committed/max)] > 2016-04-17 23:02:12,268 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 1, timeout: 500 milliseconds) > 2016-04-17 23:02:12,397 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > 2016-04-17 23:02:12,796 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 2, timeout: 1000 milliseconds) > 2016-04-17 23:02:13,816 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 3, timeout: 2000 milliseconds) > 2016-04-17 23:02:15,836 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 4, timeout: 4000 milliseconds) > 2016-04-17 23:02:19,856 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 5, timeout: 8000 milliseconds) > 2016-04-17 23:02:19,863 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > 2016-04-17 23:02:27,883 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 6, timeout: 16000 milliseconds) > 2016-04-17 23:02:27,889 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > 2016-04-17 23:02:43,895 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 7, timeout: 30 seconds) > 2016-04-17 23:02:50,918 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > 2016-04-17 23:03:13,915 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 8, timeout: 30 seconds) > 2016-04-17 23:03:13,921 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > 2016-04-17 23:03:43,936 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://[email protected]:6123/user/jobmanager > (attempt 9, timeout: 30 seconds) > 2016-04-17 23:03:43,942 WARN > akka.remote.ReliableDeliverySupervisor - Association > with remote system [akka.tcp://[email protected]:6123] has failed, address > is now gated for [5000] ms. Reason is: [Association failed with > [akka.tcp://[email protected]:6123]]. > > -- > Regards, > Harshita Agrawal
signature.asc
Description: Digital signature
