[ https://issues.apache.org/jira/browse/FLINK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205511#comment-16205511 ]
ASF GitHub Bot commented on FLINK-4816: --------------------------------------- GitHub user tony810430 opened a pull request: https://github.com/apache/flink/pull/4828 [FLINK-4816] [checkpoints] Executions failed from "DEPLOYING" should retain restored checkpoint information ## What is the purpose of the change This PR is base on #3478 and added some improvements. ## Brief change log - Rebased #3478 to the latest master branch. - Checked if CheckpointCoordinator is exist. - Added corresponding tests. ## Verifying this change - Updated tests in `CheckpointCoordinatorTest` and `ExecutionVertexDeploymentTest`. - Added a new test in `ExecutionVertexDeploymentTest` for deploying failed after restoring. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **don't know** - The runtime per-record code paths (performance sensitive): **don't know** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not documented** You can merge this pull request into a Git repository by running: $ git pull https://github.com/tony810430/flink FLINK-4816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4828.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4828 ---- commit 5ed707775be9d0b61edf62884e97a7562acfc787 Author: Ramkrishna <ramkrishna.s.vasude...@intel.com> Date: 2017-03-06T11:25:37Z [FLINK-4816] Executions failed from "DEPLOYING" should retain restored checkpoint information commit 03678006a9729bf5339b812e032e57727d6409f6 Author: Ramkrishna <ramkrishna.s.vasude...@intel.com> Date: 2017-03-06T11:41:58Z Add lock to getRestoredCheckpointID commit 2ddcc511e3f6716359bb47edba0ed5ad5be0ec5f Author: Tony Wei <tony19920...@gmail.com> Date: 2017-10-16T02:21:22Z check if CheckpointCoordinator is enable, add corresponding unit tests ---- > Executions failed from "DEPLOYING" should retain restored checkpoint > information > -------------------------------------------------------------------------------- > > Key: FLINK-4816 > URL: https://issues.apache.org/jira/browse/FLINK-4816 > Project: Flink > Issue Type: Sub-task > Components: Distributed Coordination > Reporter: Stephan Ewen > > When an execution fails from state {{DEPLOYING}}, it should wrap the failure > to better report the failure cause: > - If no checkpoint was restored, it should wrap the exception in a > {{DeployTaskException}} > - If a checkpoint was restored, it should wrap the exception in a > {{RestoreTaskException}} and record the id of the checkpoint that was > attempted to be restored. -- This message was sent by Atlassian JIRA (v6.4.14#64029)