[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

GitBox Wed, 10 Jul 2019 09:59:17 -0700

GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA 
dataset E2E test with new RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/9060#discussion_r302172084


 ##########
 File path: flink-end-to-end-tests/test-scripts/test_ha_dataset.sh
 ##########
 @@ -53,20 +52,51 @@ function run_ha_test() {
 
     wait_job_running ${JOB_ID}
 
-    # start the watchdog that keeps the number of JMs stable
-    start_ha_jm_watchdog 1 "StandaloneSessionClusterEntrypoint" start_jm_cmd 
"8081"
-
+    local c
     for (( c=0; c<${JM_KILLS}; c++ )); do
         # kill the JM and wait for watchdog to
         # create a new one which will take over
         kill_single 'StandaloneSessionClusterEntrypoint'
         wait_job_running ${JOB_ID}
     done
 
-    cancel_job ${JOB_ID}
+    for (( c=0; c<${TM_KILLS}; c++ )); do
+        sleep $(( ( RANDOM % 10 )  + 1 ))
+        kill_and_replace_random_task_manager
+        wait_job_running ${JOB_ID}
+    done
+
+    wait_job_terminal_state ${JOB_ID} "FINISHED"
 
 Review comment:
   These are valid concerns.
   
   > How much longer does the test now run for?
   
   The test runs 4.5-5 minutes on my machine. It takes around 2 minutes to 
complete the batch job after the last injected fault (time determined using 
unscientific methods). The test in its current form is rather similar to 
`test_batch_allround.sh` so there is a chance that these can be merged.
   
   > I like neither option, do admit though that this would make it very 
difficult (or even impossible) to verify the correctness of the output.
   
   I don't see a good solution yet. Here are some options:
   1. Make job block on external signals (files), and make job smaller (smaller 
dataset)
   1. Leave it as before, i.e., don't verify correctness of the output
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] GJL commented on a change in pull request #9060: [FLINK-13145][tests] Run HA dataset E2E test with new RestartPipelinedRegionStrategy

Reply via email to