Aleksandr Filichkin created FLINK-8829:

             Summary: Flink in EMR(YARN) is down due to Akka communication issue
                 Key: FLINK-8829
             Project: Flink
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.3.2
            Reporter: Aleksandr Filichkin


We have running Flink 1.3.2 app in Amazon EMR. Every week our Flink job is down 
due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
Association with remote system 
 has failed, address is now gated for [5000] ms. Reason: [Association failed 
 Caused by: [Connection refused:] 
2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: 
 2018-02-16 19:00:05,596 INFO 
org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
 Triggering connection timeout._

Do you have any ideas how to troubleshoot it?


This message was sent by Atlassian JIRA

Reply via email to