Hi, I've setup Flink HA on AWS ( 3 Taskmanagers and 2 Jobmanagers each are on EC2 m4.large instance with checkpoint enabled on S3 ). My topology works fine, but after few hours I do see that Taskmanagers gets detached with Jobmanager. I tried to reach Jobmanager using telnet at the same time and it worked but Taskmanager does not succeed in connecting again. It attaches only after I restart it. I tried following settings but still the problem persists.
akka.ask.timeout: 20 s akka.lookup.timeout: 20 s akka.watch.heartbeat.interval: 20 s Please find attached snapshot on one of the Taskmanager. Is there any setting that I need to do ? -- Thanks, Deepak Jha