[
https://issues.apache.org/jira/browse/FLINK-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683423#comment-16683423
]
ASF GitHub Bot commented on FLINK-10821:
----------------------------------------
igalshilman commented on a change in pull request #7072: [FLINK-10821] Fix
resume from externalized checkpoints E2E Test
URL: https://github.com/apache/flink/pull/7072#discussion_r232572095
##########
File path:
flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh
##########
@@ -125,9 +126,20 @@ fi
echo "Restoring job with externalized checkpoint at $CHECKPOINT_PATH ..."
-BASE_JOB_CMD=`buildBaseJobCmd $NEW_DOP`
+BASE_JOB_CMD=`buildBaseJobCmd $NEW_DOP "-s file://${CHECKPOINT_PATH}"`
+JOB_CMD=""
+if [[ $SIMULATE_FAILURE == "true" ]]; then
Review comment:
Thanks for looking into it @tillrohrmann!
We construct a failing job (that doesn't actually fail since all the
parameters are 0) so that the recovered job and the original job would have
exactly the same operators (omitting the `--test.simulate_failure true` omits
the `FailureMapper` from the recovered job.)
Having the jobs differ doesn't allow recovering from the externalized
checkpoint without supplying `--allowNonRestoredState` (unless there is a
different way that I'm not aware of)
The problem with passing this argument is:
1. I think It makes the test more fragile , if the recovered job would
accidentally start with a fresh state.
2. The `SemanticsCheckMapper` reports violation in the following case:
```run_test "Running HA per-job cluster (file, sync) end-to-end test"
"$END_TO_END_DIR/test-scripts/test_ha_per_job_cluster_datastream.sh file false
false" "skip_check_exceptions"
``` which looks like an artifact of how a job is constructed from its
arguments via `DataStreamAllroundTestJobFactory` and the missing `.uid()` but
we can look into that in a followup ticket?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Resuming Externalized Checkpoint E2E test does not resume from Externalized
> Checkpoint
> --------------------------------------------------------------------------------------
>
> Key: FLINK-10821
> URL: https://issues.apache.org/jira/browse/FLINK-10821
> Project: Flink
> Issue Type: Bug
> Components: E2E Tests
> Affects Versions: 1.7.0
> Reporter: Gary Yao
> Assignee: Igal Shilman
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Path to externalized checkpoint is not passed as the {{-s}} argument:
> https://github.com/apache/flink/blob/483507a65c7547347eaafb21a24967c470f94ed6/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh#L128
> That is, the test currently restarts the job without checkpoint.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)