This is an automated email from the ASF dual-hosted git repository. hangxiang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
commit d0dbd51c89df6374d5eb54e12925e032a255385e Author: 周仁祥 <[email protected]> AuthorDate: Tue Dec 12 16:19:43 2023 +0800 [FLINK-32881][checkpoint] update docs for savepoint detached option --- docs/content.zh/docs/deployment/cli.md | 39 ++++++++++++++++++++++++++++ docs/content.zh/docs/ops/state/savepoints.md | 10 +++++++ docs/content.zh/docs/ops/upgrading.md | 3 +++ docs/content/docs/deployment/cli.md | 39 ++++++++++++++++++++++++++++ docs/content/docs/ops/state/savepoints.md | 12 +++++++++ docs/content/docs/ops/upgrading.md | 3 +++ 6 files changed, 106 insertions(+) diff --git a/docs/content.zh/docs/deployment/cli.md b/docs/content.zh/docs/deployment/cli.md index 544dcb835b8..d71dbdd4e2c 100644 --- a/docs/content.zh/docs/deployment/cli.md +++ b/docs/content.zh/docs/deployment/cli.md @@ -125,6 +125,43 @@ Lastly, you can optionally provide what should be the [binary format]({{< ref "d The path to the savepoint can be used later on to [restart the Flink job](#starting-a-job-from-a-savepoint). +If the state size of the job is quite big, the client will get a timeout exception since it has to wait for the savepoint finished. +``` +Triggering savepoint for job bec5244e09634ad71a80785937a9732d. +Waiting for response... + +-------------------------------------------------------------- +The program finished with the following exception: + +org.apache.flink.util.FlinkException: Triggering a savepoint for the job bec5244e09634ad71a80785937a9732d failed. + at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend. java:828) + at org.apache.flink.client.cli.CliFrontend.lambda$savepopint$8(CliFrontend.java:794) + at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1078) + at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:779) + at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1150) + at org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1226) + at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) + at org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1226) + at org.apache.flink.client.cli.CliFrontend.main(CliFronhtend.java:1194) +Caused by: java.util.concurrent.TimeoutException + at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) + at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) + at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:822) + ... 8 more +``` +In this case, we could use "-detached" option to trigger a detached savepoint, the client will return the trigger id immediately. +```bash +$ ./bin/flink savepoint \ + $JOB_ID \ + /tmp/flink-savepoints + -detached +``` +``` +Triggering savepoint in detached mode for job bec5244e09634ad71a80785937a9732d. +Successfully trigger manual savepoint, triggerId: 2505bbd12c5b58fd997d0f193db44b97 +``` +We can get the status of the detached savepoint by [rest api]({{< ref "docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid). + #### Disposing a Savepoint The `savepoint` action can be also used to remove savepoints. `--dispose` with the corresponding @@ -214,6 +251,8 @@ Use the `--drain` flag if you want to terminate the job permanently. If you want to resume the job at a later point in time, then do not drain the pipeline because it could lead to incorrect results when the job is resumed. {{< /hint >}} +If you want to trigger the savepoint in detached mode, add option `-detached` to the command. + Lastly, you can optionally provide what should be the [binary format]({{< ref "docs/ops/state/savepoints" >}}#savepoint-format) of the savepoint. #### Cancelling a Job Ungracefully diff --git a/docs/content.zh/docs/ops/state/savepoints.md b/docs/content.zh/docs/ops/state/savepoints.md index 7b14669db4f..2294fc97923 100644 --- a/docs/content.zh/docs/ops/state/savepoints.md +++ b/docs/content.zh/docs/ops/state/savepoints.md @@ -142,6 +142,14 @@ $ bin/flink savepoint :jobId [:targetDirectory] $ bin/flink savepoint --type [native/canonical] :jobId [:targetDirectory] ``` +使用上述命令触发savepoint时,client需要等待savepoint制作完成,因此当任务的状态较大时,可能会导致client出现超时的情况。在这种情况下可以使用detach模式来触发savepoint。 + +```shell +$ bin/flink savepoint :jobId [:targetDirectory] -detached +``` + +使用该命令时,client拿到本次savepoint的trigger id后立即返回,可以通过[REST API]({{< ref "docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid)来监控本次savepoint的制作情况。 + #### 使用 YARN 触发 Savepoint ```shell @@ -160,6 +168,8 @@ $ bin/flink stop --type [native/canonical] --savepointPath [:targetDirectory] :j 这将自动触发 ID 为 `:jobid` 的作业的 Savepoint,并停止该作业。此外,你可以指定一个目标文件系统目录来存储 Savepoint 。该目录需要能被 JobManager(s) 和 TaskManager(s) 访问。你也可以指定创建 Savepoint 的格式。如果没有指定,会采用标准格式创建 Savepoint。 +如果你想使用detach模式触发Savepoint,在命令行后添加选项`-detached`即可。 + ### 从 Savepoint 恢复 ```shell diff --git a/docs/content.zh/docs/ops/upgrading.md b/docs/content.zh/docs/ops/upgrading.md index 322c00a566d..ff433609001 100644 --- a/docs/content.zh/docs/ops/upgrading.md +++ b/docs/content.zh/docs/ops/upgrading.md @@ -70,6 +70,8 @@ That same code would have to be recompiled when upgrading to 1.16.0 though. ``` 建议定期获取 Savepoint ,以便能够从之前的时间点重新启动应用程序。 +如果你想使用detach模式触发 Savepoint,只需添加选项`-detached`。 + * 作获取 Savepoint 并停止应用程序。 ```bash > ./bin/flink cancel -s [ Savepoint 的路径] <jobID> @@ -216,6 +218,7 @@ val mappedEvents: DataStream[(Int, Long)] = events ```shell $ bin/flink stop [--savepointPath :savepointPath] :jobId ``` +如果你想使用detach模式触发Savepoint,在命令行后添加选项`-detached`即可。 更多详情,请阅读 [savepoint documentation]({{< ref "docs/ops/state/savepoints" >}}). diff --git a/docs/content/docs/deployment/cli.md b/docs/content/docs/deployment/cli.md index 198c1e1c93b..a8818a4fb6b 100644 --- a/docs/content/docs/deployment/cli.md +++ b/docs/content/docs/deployment/cli.md @@ -123,6 +123,43 @@ Lastly, you can optionally provide what should be the [binary format]({{< ref "d The path to the savepoint can be used later on to [restart the Flink job](#starting-a-job-from-a-savepoint). +If the state of the job is quite big, the client will get a timeout exception since it should wait for the savepoint finished. +``` +Triggering savepoint for job bec5244e09634ad71a80785937a9732d. +Waiting for response... + +-------------------------------------------------------------- +The program finished with the following exception: + +org.apache.flink.util.FlinkException: Triggering a savepoint for the job bec5244e09634ad71a80785937a9732d failed. + at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend. java:828) + at org.apache.flink.client.cli.CliFrontend.lambda$savepopint$8(CliFrontend.java:794) + at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1078) + at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:779) + at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1150) + at org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1226) + at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) + at org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1226) + at org.apache.flink.client.cli.CliFrontend.main(CliFronhtend.java:1194) +Caused by: java.util.concurrent.TimeoutException + at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) + at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) + at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:822) + ... 8 more +``` +In this case, we could use "-detached" option to trigger a detached savepoint, the client will return immediately as soon as the trigger id returns. +```bash +$ ./bin/flink savepoint \ + $JOB_ID \ + /tmp/flink-savepoints + -detached +``` +``` +Triggering savepoint in detached mode for job bec5244e09634ad71a80785937a9732d. +Successfully trigger manual savepoint, triggerId: 2505bbd12c5b58fd997d0f193db44b97 +``` +We could get the status of the detached savepoint by [rest api]({{< ref "docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid). + #### Disposing a Savepoint The `savepoint` action can be also used to remove savepoints. `--dispose` with the corresponding @@ -212,6 +249,8 @@ Use the `--drain` flag if you want to terminate the job permanently. If you want to resume the job at a later point in time, then do not drain the pipeline because it could lead to incorrect results when the job is resumed. {{< /hint >}} +If you want to trigger the savepoint in detached mode, add option `-detached` to the command. + Lastly, you can optionally provide what should be the [binary format]({{< ref "docs/ops/state/savepoints" >}}#savepoint-format) of the savepoint. #### Cancelling a Job Ungracefully diff --git a/docs/content/docs/ops/state/savepoints.md b/docs/content/docs/ops/state/savepoints.md index c08587ec178..c13cd62e6cc 100644 --- a/docs/content/docs/ops/state/savepoints.md +++ b/docs/content/docs/ops/state/savepoints.md @@ -167,6 +167,16 @@ the savepoint should be taken. By default the savepoint will be taken in canonic $ bin/flink savepoint --type [native/canonical] :jobId [:targetDirectory] ``` +When using the above command to trigger a savepoint, the client needs to wait for the savepoint +to be completed. Therefore, the client may time out when the state size of the task is large. +In this case, you can trigger the savepoint in detached mode. + +```shell +$ bin/flink savepoint :jobId [:targetDirectory] -detached +``` +When using this command, the client returns immediately after getting the trigger id of +the savepoint. You can monitor the status of the savepoint through the REST API [rest api]({{< ref "docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid). + #### Trigger a Savepoint with YARN ```shell @@ -186,6 +196,8 @@ you can specify a target file system directory to store the savepoint in. The di accessible by the JobManager(s) and TaskManager(s). You can also pass a type in which the savepoint should be taken. By default the savepoint will be taken in canonical format. +If you want to trigger the savepoint in detached mode, add option `-detached` to the command. + ### Resuming from Savepoints ```shell diff --git a/docs/content/docs/ops/upgrading.md b/docs/content/docs/ops/upgrading.md index cc7d5e28cd8..b06427c2a96 100644 --- a/docs/content/docs/ops/upgrading.md +++ b/docs/content/docs/ops/upgrading.md @@ -103,6 +103,7 @@ There are two ways of taking a savepoint from a running streaming application. > ./bin/flink savepoint <jobID> [pathToSavepoint] ``` It is recommended to periodically take savepoints in order to be able to restart an application from a previous point in time. +If you want to trigger a savepoint in detached mode, just add the option `-detached`. * Taking a savepoint and stopping the application as a single action. ```bash @@ -251,6 +252,8 @@ You can do this with the command: $ bin/flink stop [--savepointPath :savepointPath] :jobId ``` +If you want to trigger the savepoint in detached mode, add option `-detached` to the command. + For more details, please read the [savepoint documentation]({{< ref "docs/ops/state/savepoints" >}}). #### STEP 2: Update your cluster to the new Flink version.
