[
https://issues.apache.org/jira/browse/FLINK-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502734#comment-14502734
]
ASF GitHub Bot commented on FLINK-1908:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/609#discussion_r28685121
--- Diff: flink-dist/src/main/flink-bin/bin/start-cluster.sh ---
@@ -37,6 +37,26 @@ fi
# cluster mode, bring up job manager locally and a task manager on every
slave host
"$FLINK_BIN_DIR"/jobmanager.sh start cluster
+# wait until jobmanager starts
+JOBMANAGER_ADDR=$(readFromConfig ${KEY_JOBM_RPC_ADDR}
"${DEFAULT_JOBM_RPC_ADDR}" "${YAML_CONF}")
+JOBMANAGER_PORT=$(readFromConfig ${KEY_JOBM_RPC_PORT}
"${DEFAULT_JOBM_RPC_PORT}" "${YAML_CONF}")
+
+echo "Waiting for job manager"
+for i in {1..30}; do
+ nc -z "${JOBMANAGER_ADDR}" $JOBMANAGER_PORT
--- End diff --
Is akka logging anything for this requests? (I suspect its logging a
WARNING that an invalid client tried to connect?)
> JobManager startup delay isn't considered when using start-cluster.sh script
> ----------------------------------------------------------------------------
>
> Key: FLINK-1908
> URL: https://issues.apache.org/jira/browse/FLINK-1908
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Affects Versions: 0.9, 0.8.1
> Environment: Linux
> Reporter: Lukas Raska
> Priority: Minor
> Original Estimate: 5m
> Remaining Estimate: 5m
>
> When starting Flink cluster via start-cluster.sh script, JobManager startup
> can be delayed (as it's started asynchronously), which can result in failed
> startup of several task managers.
> Solution is to wait certain amount of time and periodically check if RPC port
> is accessible, then proceed with starting task managers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)