[
https://issues.apache.org/jira/browse/FLINK-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415284#comment-16415284
]
ASF GitHub Bot commented on FLINK-8973:
---------------------------------------
Github user twalthr commented on a diff in the pull request:
https://github.com/apache/flink/pull/5750#discussion_r177342612
--- Diff: flink-end-to-end-tests/test-scripts/common.sh ---
@@ -59,6 +146,57 @@ function start_cluster {
done
}
+function jm_watchdog() {
+ expectedJms=$1
+ ipPort=$2
+
+ while true; do
+ runningJms=`jps | grep -o 'StandaloneSessionClusterEntrypoint' |
wc -l`;
+ missingJms=$((expectedJms-runningJms))
+ for (( c=0; c<missingJms; c++ )); do
+ "$FLINK_DIR"/bin/jobmanager.sh start "localhost" $2
+ done
+ sleep 5;
+ done
+}
+
+function kill_jm {
+ idx=$1
+
+ jm_pids=`jps | grep 'StandaloneSessionClusterEntrypoint' | cut -d " "
-f 1`
+ jm_pids=(${jm_pids[@]})
+
+ pid=${jm_pids[$idx]}
+
+ # kill the JM and wait for the completion of its termination
--- End diff --
We should remove the `wait for completion` comment.
> End-to-end test: Run general purpose job with failures in standalone mode
> -------------------------------------------------------------------------
>
> Key: FLINK-8973
> URL: https://issues.apache.org/jira/browse/FLINK-8973
> Project: Flink
> Issue Type: Sub-task
> Components: Tests
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Kostas Kloudas
> Priority: Blocker
> Fix For: 1.5.0
>
>
> We should set up an end-to-end test which runs the general purpose job
> (FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When
> running the job, the job failures should be activated.
> Additionally, we should randomly kill Flink processes (cluster entrypoint and
> TaskExecutors). When killing them, we should also spawn new processes to make
> up for the loss.
> This end-to-end test case should run with all different state backend
> settings: {{RocksDB}} (full/incremental, async/sync), {{FsStateBackend}}
> (sync/async)
> We should then verify that the general purpose job is successfully recovered
> without data loss or other failures.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)