[
https://issues.apache.org/jira/browse/FLINK-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415279#comment-16415279
]
ASF GitHub Bot commented on FLINK-8973:
---------------------------------------
Github user twalthr commented on a diff in the pull request:
https://github.com/apache/flink/pull/5750#discussion_r177341094
--- Diff: flink-end-to-end-tests/test-scripts/common.sh ---
@@ -59,9 +162,42 @@ function start_cluster {
done
}
+function jm_watchdog() {
+ expectedJms=$1
+ ipPort=$2
+
+ while true; do
+ runningJms=`jps | grep -o 'StandaloneSessionClusterEntrypoint' |
wc -l`;
+ missingJms=$((expectedJms-runningJms))
+ for (( c=0; c<missingJms; c++ )); do
+ "$FLINK_DIR"/bin/jobmanager.sh start "localhost" ${ipPort}
--- End diff --
Does it makes sense to start multiple job managers with the same `ipPort`?
> End-to-end test: Run general purpose job with failures in standalone mode
> -------------------------------------------------------------------------
>
> Key: FLINK-8973
> URL: https://issues.apache.org/jira/browse/FLINK-8973
> Project: Flink
> Issue Type: Sub-task
> Components: Tests
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Kostas Kloudas
> Priority: Blocker
> Fix For: 1.5.0
>
>
> We should set up an end-to-end test which runs the general purpose job
> (FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When
> running the job, the job failures should be activated.
> Additionally, we should randomly kill Flink processes (cluster entrypoint and
> TaskExecutors). When killing them, we should also spawn new processes to make
> up for the loss.
> This end-to-end test case should run with all different state backend
> settings: {{RocksDB}} (full/incremental, async/sync), {{FsStateBackend}}
> (sync/async)
> We should then verify that the general purpose job is successfully recovered
> without data loss or other failures.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)