[ https://issues.apache.org/jira/browse/FLINK-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann closed FLINK-7580. -------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 Fixed via 51d9a748d40f29b197b2d77fe8aa1b2439737da3 > Let LeaderGatewayRetriever implementations automatically retry failed gateway > retrieval operations > -------------------------------------------------------------------------------------------------- > > Key: FLINK-7580 > URL: https://issues.apache.org/jira/browse/FLINK-7580 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination, Webfrontend > Affects Versions: 1.4.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Minor > Fix For: 1.4.0 > > > The {{LeaderGatewayRetrieval}} implementations {{AkkaJobManagerRetriever}} > and the {{RpcGatewayRetriever}} should automatically retry failed gateway > retrieval operations. This could be the case if the {{WebRuntimeMonitor}} is > started before the actual Akka/RPC component. I would propose to retry it a > fixed number of times with a short delay in between. If the resolution fails > after exceeding the retries, a new retrieval operation will be started when > requesting information from the {{WebRuntimeMonitor}} with FLINK-7533. This > ensures that the retry operation won't run forever but also that it will > eventually connect to the Akka/RPC component if it is existent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)