This is an automated email from the ASF dual-hosted git repository.

hangxiang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit d0dbd51c89df6374d5eb54e12925e032a255385e
Author: 周仁祥 <[email protected]>
AuthorDate: Tue Dec 12 16:19:43 2023 +0800

    [FLINK-32881][checkpoint] update docs for savepoint detached option
---
 docs/content.zh/docs/deployment/cli.md       | 39 ++++++++++++++++++++++++++++
 docs/content.zh/docs/ops/state/savepoints.md | 10 +++++++
 docs/content.zh/docs/ops/upgrading.md        |  3 +++
 docs/content/docs/deployment/cli.md          | 39 ++++++++++++++++++++++++++++
 docs/content/docs/ops/state/savepoints.md    | 12 +++++++++
 docs/content/docs/ops/upgrading.md           |  3 +++
 6 files changed, 106 insertions(+)

diff --git a/docs/content.zh/docs/deployment/cli.md 
b/docs/content.zh/docs/deployment/cli.md
index 544dcb835b8..d71dbdd4e2c 100644
--- a/docs/content.zh/docs/deployment/cli.md
+++ b/docs/content.zh/docs/deployment/cli.md
@@ -125,6 +125,43 @@ Lastly, you can optionally provide what should be the 
[binary format]({{< ref "d
 
 The path to the savepoint can be used later on to [restart the Flink 
job](#starting-a-job-from-a-savepoint).
 
+If the state size of the job is quite big, the client will get a timeout 
exception since it has to wait for the savepoint finished.
+```
+Triggering savepoint for job bec5244e09634ad71a80785937a9732d.
+Waiting for response...
+
+--------------------------------------------------------------
+The program finished with the following exception:
+
+org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
bec5244e09634ad71a80785937a9732d failed.
+        at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend. java:828)
+        at 
org.apache.flink.client.cli.CliFrontend.lambda$savepopint$8(CliFrontend.java:794)
+        at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1078)
+        at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:779)
+        at 
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1150)
+        at 
org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1226)
+        at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
+        at 
org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1226)
+        at org.apache.flink.client.cli.CliFrontend.main(CliFronhtend.java:1194)
+Caused by: java.util.concurrent.TimeoutException
+        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
+        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
+        at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:822)
+        ... 8 more
+```
+In this case, we could use "-detached" option to trigger a detached savepoint, 
the client will return the trigger id immediately.
+```bash
+$ ./bin/flink savepoint \
+      $JOB_ID \ 
+      /tmp/flink-savepoints
+      -detached
+```
+```
+Triggering savepoint in detached mode for job bec5244e09634ad71a80785937a9732d.
+Successfully trigger manual savepoint, triggerId: 
2505bbd12c5b58fd997d0f193db44b97
+```
+We can get the status of the detached savepoint by [rest api]({{< ref 
"docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid).
+
 #### Disposing a Savepoint
 
 The `savepoint` action can be also used to remove savepoints. `--dispose` with 
the corresponding 
@@ -214,6 +251,8 @@ Use the `--drain` flag if you want to terminate the job 
permanently.
 If you want to resume the job at a later point in time, then do not drain the 
pipeline because it could lead to incorrect results when the job is resumed.
 {{< /hint >}}
 
+If you want to trigger the savepoint in detached mode, add option `-detached` 
to the command.
+
 Lastly, you can optionally provide what should be the [binary format]({{< ref 
"docs/ops/state/savepoints" >}}#savepoint-format) of the savepoint.
 
 #### Cancelling a Job Ungracefully
diff --git a/docs/content.zh/docs/ops/state/savepoints.md 
b/docs/content.zh/docs/ops/state/savepoints.md
index 7b14669db4f..2294fc97923 100644
--- a/docs/content.zh/docs/ops/state/savepoints.md
+++ b/docs/content.zh/docs/ops/state/savepoints.md
@@ -142,6 +142,14 @@ $ bin/flink savepoint :jobId [:targetDirectory]
 $ bin/flink savepoint --type [native/canonical] :jobId [:targetDirectory]
 ```
 
+使用上述命令触发savepoint时,client需要等待savepoint制作完成,因此当任务的状态较大时,可能会导致client出现超时的情况。在这种情况下可以使用detach模式来触发savepoint。
+
+```shell
+$ bin/flink savepoint :jobId [:targetDirectory] -detached
+```
+
+使用该命令时,client拿到本次savepoint的trigger id后立即返回,可以通过[REST API]({{< ref 
"docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid)来监控本次savepoint的制作情况。
+
 #### 使用 YARN 触发 Savepoint
 
 ```shell
@@ -160,6 +168,8 @@ $ bin/flink stop --type [native/canonical] --savepointPath 
[:targetDirectory] :j
 
 这将自动触发 ID 为 `:jobid` 的作业的 Savepoint,并停止该作业。此外,你可以指定一个目标文件系统目录来存储 Savepoint 
。该目录需要能被 JobManager(s) 和 TaskManager(s) 访问。你也可以指定创建 Savepoint 
的格式。如果没有指定,会采用标准格式创建 Savepoint。
 
+如果你想使用detach模式触发Savepoint,在命令行后添加选项`-detached`即可。
+
 ### 从 Savepoint 恢复
 
 ```shell
diff --git a/docs/content.zh/docs/ops/upgrading.md 
b/docs/content.zh/docs/ops/upgrading.md
index 322c00a566d..ff433609001 100644
--- a/docs/content.zh/docs/ops/upgrading.md
+++ b/docs/content.zh/docs/ops/upgrading.md
@@ -70,6 +70,8 @@ That same code would have to be recompiled when upgrading to 
1.16.0 though.
 ```
 建议定期获取 Savepoint ,以便能够从之前的时间点重新启动应用程序。
 
+如果你想使用detach模式触发 Savepoint,只需添加选项`-detached`。
+
 * 作获取 Savepoint 并停止应用程序。
 ```bash
 > ./bin/flink cancel -s [ Savepoint 的路径] <jobID>
@@ -216,6 +218,7 @@ val mappedEvents: DataStream[(Int, Long)] = events
 ```shell
 $ bin/flink stop [--savepointPath :savepointPath] :jobId
 ```
+如果你想使用detach模式触发Savepoint,在命令行后添加选项`-detached`即可。
 
 更多详情,请阅读 [savepoint documentation]({{< ref "docs/ops/state/savepoints" >}}).
 
diff --git a/docs/content/docs/deployment/cli.md 
b/docs/content/docs/deployment/cli.md
index 198c1e1c93b..a8818a4fb6b 100644
--- a/docs/content/docs/deployment/cli.md
+++ b/docs/content/docs/deployment/cli.md
@@ -123,6 +123,43 @@ Lastly, you can optionally provide what should be the 
[binary format]({{< ref "d
 
 The path to the savepoint can be used later on to [restart the Flink 
job](#starting-a-job-from-a-savepoint).
 
+If the state of the job is quite big, the client will get a timeout exception 
since it should wait for the savepoint finished.
+```
+Triggering savepoint for job bec5244e09634ad71a80785937a9732d.
+Waiting for response...
+
+--------------------------------------------------------------
+The program finished with the following exception:
+
+org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
bec5244e09634ad71a80785937a9732d failed.
+        at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend. java:828)
+        at 
org.apache.flink.client.cli.CliFrontend.lambda$savepopint$8(CliFrontend.java:794)
+        at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1078)
+        at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:779)
+        at 
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1150)
+        at 
org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1226)
+        at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
+        at 
org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1226)
+        at org.apache.flink.client.cli.CliFrontend.main(CliFronhtend.java:1194)
+Caused by: java.util.concurrent.TimeoutException
+        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
+        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
+        at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:822)
+        ... 8 more
+```
+In this case, we could use "-detached" option to trigger a detached savepoint, 
the client will return immediately as soon as the trigger id returns.
+```bash
+$ ./bin/flink savepoint \
+      $JOB_ID \ 
+      /tmp/flink-savepoints
+      -detached
+```
+```
+Triggering savepoint in detached mode for job bec5244e09634ad71a80785937a9732d.
+Successfully trigger manual savepoint, triggerId: 
2505bbd12c5b58fd997d0f193db44b97
+```
+We could get the status of the detached savepoint by [rest api]({{< ref 
"docs/ops/rest_api" >}}/#jobs-jobid-checkpoints-triggerid).
+
 #### Disposing a Savepoint
 
 The `savepoint` action can be also used to remove savepoints. `--dispose` with 
the corresponding 
@@ -212,6 +249,8 @@ Use the `--drain` flag if you want to terminate the job 
permanently.
 If you want to resume the job at a later point in time, then do not drain the 
pipeline because it could lead to incorrect results when the job is resumed.
 {{< /hint >}}
 
+If you want to trigger the savepoint in detached mode, add option `-detached` 
to the command.
+
 Lastly, you can optionally provide what should be the [binary format]({{< ref 
"docs/ops/state/savepoints" >}}#savepoint-format) of the savepoint.
 
 #### Cancelling a Job Ungracefully
diff --git a/docs/content/docs/ops/state/savepoints.md 
b/docs/content/docs/ops/state/savepoints.md
index c08587ec178..c13cd62e6cc 100644
--- a/docs/content/docs/ops/state/savepoints.md
+++ b/docs/content/docs/ops/state/savepoints.md
@@ -167,6 +167,16 @@ the savepoint should be taken. By default the savepoint 
will be taken in canonic
 $ bin/flink savepoint --type [native/canonical] :jobId [:targetDirectory]
 ```
 
+When using the above command to trigger a savepoint, the client needs to wait 
for the savepoint 
+to be completed. Therefore, the client may time out when the state size of the 
task is large.
+In this case, you can trigger the savepoint in detached mode.
+
+```shell
+$ bin/flink savepoint :jobId [:targetDirectory] -detached
+```
+When using this command, the client returns immediately after getting the 
trigger id of 
+the savepoint. You can monitor the status of the savepoint through the REST 
API [rest api]({{< ref "docs/ops/rest_api" 
>}}/#jobs-jobid-checkpoints-triggerid).
+
 #### Trigger a Savepoint with YARN
 
 ```shell
@@ -186,6 +196,8 @@ you can specify a target file system directory to store the 
savepoint in. The di
 accessible by the JobManager(s) and TaskManager(s). You can also pass a type 
in which the savepoint
 should be taken. By default the savepoint will be taken in canonical format.
 
+If you want to trigger the savepoint in detached mode, add option `-detached` 
to the command.
+
 ### Resuming from Savepoints
 
 ```shell
diff --git a/docs/content/docs/ops/upgrading.md 
b/docs/content/docs/ops/upgrading.md
index cc7d5e28cd8..b06427c2a96 100644
--- a/docs/content/docs/ops/upgrading.md
+++ b/docs/content/docs/ops/upgrading.md
@@ -103,6 +103,7 @@ There are two ways of taking a savepoint from a running 
streaming application.
 > ./bin/flink savepoint <jobID> [pathToSavepoint]
 ```
 It is recommended to periodically take savepoints in order to be able to 
restart an application from a previous point in time.
+If you want to trigger a savepoint in detached mode, just add the option 
`-detached`.
 
 * Taking a savepoint and stopping the application as a single action. 
 ```bash
@@ -251,6 +252,8 @@ You can do this with the command:
 $ bin/flink stop [--savepointPath :savepointPath] :jobId
 ```
 
+If you want to trigger the savepoint in detached mode, add option `-detached` 
to the command.
+
 For more details, please read the [savepoint documentation]({{< ref 
"docs/ops/state/savepoints" >}}).
 
 #### STEP 2: Update your cluster to the new Flink version.

Reply via email to