[
https://issues.apache.org/jira/browse/FLINK-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411543#comment-16411543
]
ASF GitHub Bot commented on FLINK-8973:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/5750#discussion_r176767031
--- Diff: flink-end-to-end-tests/test-scripts/common.sh ---
@@ -59,6 +146,57 @@ function start_cluster {
done
}
+function jm_watchdog() {
+ expectedJms=$1
+ ipPort=$2
+
+ while true; do
+ runningJms=`jps | grep -o 'StandaloneSessionClusterEntrypoint' |
wc -l`;
+ missingJms=$((expectedJms-runningJms))
+ for (( c=0; c<missingJms; c++ )); do
+ "$FLINK_DIR"/bin/jobmanager.sh start "localhost" $2
+ done
+ sleep 5;
+ done
+}
+
+function kill_jm {
+ idx=$1
+
+ jm_pids=`jps | grep 'StandaloneSessionClusterEntrypoint' | cut -d " "
-f 1`
+ jm_pids=(${jm_pids[@]})
+
+ pid=${jm_pids[$idx]}
+
+ # kill the JM and wait for the completion of its termination
+ kill -9 ${pid}
+
+ echo "Killed JM @ ${pid}."
+}
+
+function stop_ha_cluster {
+ echo "Tearing down HA Cluster..."
+ stop_cluster
+ stop_local_zk
+ cleanup
+}
+
+function stop_local_zk {
+ while read server ; do
+ server=$(echo -e "${server}" | sed -e 's/^[[:space:]]*//' -e
's/[[:space:]]*$//') # trim
+
+ # match server.id=address[:port[:port]]
+ if [[ $server =~ ^server\.([0-9]+)[[:space:]]*\=[[:space:]]*([^:
\#]+) ]]; then
+ id=${BASH_REMATCH[1]}
--- End diff --
but the assignment should never happen if we enter the `else` branch.
There is another assignment to `server` outside of the condition.
> End-to-end test: Run general purpose job with failures in standalone mode
> -------------------------------------------------------------------------
>
> Key: FLINK-8973
> URL: https://issues.apache.org/jira/browse/FLINK-8973
> Project: Flink
> Issue Type: Sub-task
> Components: Tests
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Kostas Kloudas
> Priority: Blocker
> Fix For: 1.5.0
>
>
> We should set up an end-to-end test which runs the general purpose job
> (FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When
> running the job, the job failures should be activated.
> Additionally, we should randomly kill Flink processes (cluster entrypoint and
> TaskExecutors). When killing them, we should also spawn new processes to make
> up for the loss.
> This end-to-end test case should run with all different state backend
> settings: {{RocksDB}} (full/incremental, async/sync), {{FsStateBackend}}
> (sync/async)
> We should then verify that the general purpose job is successfully recovered
> without data loss or other failures.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)