[
https://issues.apache.org/jira/browse/FLINK-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503801#comment-14503801
]
ASF GitHub Bot commented on FLINK-1908:
---------------------------------------
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/609#issuecomment-94581541
@DarkKnightCZ that sounds strange. The TM should not terminate itself if it
cannot connect to the JM unless the maximum registration duration has been
configured. Is it possible that you link the log file of one of the failed TM?
That would allow to investigate the problem more thoroughly.
> JobManager startup delay isn't considered when using start-cluster.sh script
> ----------------------------------------------------------------------------
>
> Key: FLINK-1908
> URL: https://issues.apache.org/jira/browse/FLINK-1908
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Affects Versions: 0.9, 0.8.1
> Environment: Linux
> Reporter: Lukas Raska
> Priority: Minor
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> When starting Flink cluster via start-cluster.sh script, JobManager startup
> can be delayed (as it's started asynchronously), which can result in failed
> startup of several task managers.
> Solution is to wait certain amount of time and periodically check if RPC port
> is accessible, then proceed with starting task managers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)