[ 
https://issues.apache.org/jira/browse/FLINK-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683426#comment-16683426
 ] 

ASF GitHub Bot commented on FLINK-10821:
----------------------------------------

igalshilman commented on a change in pull request #7072: [FLINK-10821] Fix 
resume from externalized checkpoints E2E Test
URL: https://github.com/apache/flink/pull/7072#discussion_r232572095
 
 

 ##########
 File path: 
flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh
 ##########
 @@ -125,9 +126,20 @@ fi
 
 echo "Restoring job with externalized checkpoint at $CHECKPOINT_PATH ..."
 
-BASE_JOB_CMD=`buildBaseJobCmd $NEW_DOP`
+BASE_JOB_CMD=`buildBaseJobCmd $NEW_DOP "-s file://${CHECKPOINT_PATH}"`
+JOB_CMD=""
+if [[ $SIMULATE_FAILURE == "true" ]]; then
 
 Review comment:
   Thanks for looking into it @tillrohrmann!
   We construct a failing job (that doesn't actually fail since all the 
parameters are 0) so that the recovered job and the original job would have 
exactly the same operators (omitting the `--test.simulate_failure true` omits 
the `FailureMapper` from the recovered job.)
   Having the jobs differ doesn't allow recovering from the externalized 
checkpoint without supplying `--allowNonRestoredState` (unless there is a 
different way that I'm not aware of)
   The problem with passing this argument is:
   1. I think It makes the test more fragile , if the recovered job would 
accidentally start with a fresh state. 
   2. The `SemanticsCheckMapper` reports violation in the following case:
   ```
   run_test "Running HA per-job cluster (file, sync) end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_ha_per_job_cluster_datastream.sh file false 
false" "skip_check_exceptions"
   ``` 
   which looks like an artifact of how a job is constructed from its arguments 
via `DataStreamAllroundTestJobFactory` and the missing `.uid()` but we can look 
into that in a followup ticket?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Resuming Externalized Checkpoint E2E test does not resume from Externalized 
> Checkpoint
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-10821
>                 URL: https://issues.apache.org/jira/browse/FLINK-10821
>             Project: Flink
>          Issue Type: Bug
>          Components: E2E Tests
>    Affects Versions: 1.7.0
>            Reporter: Gary Yao
>            Assignee: Igal Shilman
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Path to externalized checkpoint is not passed as the {{-s}} argument:
> https://github.com/apache/flink/blob/483507a65c7547347eaafb21a24967c470f94ed6/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh#L128
> That is, the test currently restarts the job without checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to