[GitHub] [flink] link3280 commented on a change in pull request #16928: [FLINK-23899][docs-zh] Translate the "Elastic Scaling" page into Chinese

GitBox Sun, 22 Aug 2021 00:21:23 -0700


link3280 commented on a change in pull request #16928:
URL: https://github.com/apache/flink/pull/16928#discussion_r693453484




##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -23,134 +23,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Elastic Scaling
+# 弹性伸缩
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+在 Apache Flink 中，可以通过手动停止 Job，然后从停止时创建的 Savepoint 恢复，最后重新指定并行度的方式来重新伸缩 Job。
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+这个文档描述的特性是 Flink 如何自动地调整并行度。
 
-## Reactive Mode
+## Reactive 模式
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+Reactive 模式是一个 MVP （minimum viable product，最小可行产品）特性。目前 Flink 
社区正在积极地从邮件列表中获取用户的使用反馈。请注意文中列举的一些限制。
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+在 Reactive 模式下，Job 会使用集群中所有的资源。当增加 TaskManager 时，Job 会自动扩容。当删除时，就会自动缩容。Flink 
会管理 Job 的并行度，始终会尽可能地使用最大值。
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+当发生伸缩时，Job 会被重启，并且会从最新的 Checkpoint 中恢复。这就意味着不需要花费额外的开销去创建 
Savepoint。当然，所需要重新处理的数据量取决于 Checkpoint 的间隔时长，而恢复的时间取决于状态的大小。
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
- 
-### Getting started
+借助 Reactive 模式，Flink 用户可以通过一些外部的监控服务产生的指标，例如：消费延迟、CPU 
利用率汇总、吞吐量、延迟等，实现一个强大的自动伸缩机制。当上述的这些指标超出或者低于一定的阈值时，增加或者减少 TaskManager 的数量。在 
Kubernetes 中，可以通过改变 Deployment 的[副本数（Replica 
Factor）](https://kubernetes.io/zh/docs/concepts/workloads/controllers/deployment/#replicas)
 实现。而在 AWS 中，可以通过改变 [Auto Scaling 
组](https://docs.aws.amazon.com/zh_cn/autoscaling/ec2/userguide/AutoScalingGroup.html)
 来实现。这类外部服务只需要负责资源的分配以及回收，而 Flink 则负责在这些资源上运行 Job。
 
-If you just want to try out Reactive Mode, follow these instructions. They 
assume that you are deploying Flink on a single machine.
+<a name="getting-started"></a>

Review comment:
       Why is there an extra link tag without the actual URL?

##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -23,134 +23,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Elastic Scaling
+# 弹性伸缩
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+在 Apache Flink 中，可以通过手动停止 Job，然后从停止时创建的 Savepoint 恢复，最后重新指定并行度的方式来重新伸缩 Job。
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+这个文档描述的特性是 Flink 如何自动地调整并行度。

Review comment:
       I think 'options' might not be correctly translated. `options` means 
users can choose either reactive mode or adaptive scheduler.

##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -23,134 +23,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Elastic Scaling
+# 弹性伸缩

Review comment:
       Maybe we could use '扩缩容', which is more consistent with the '扩容' and 
'缩容' in the following content.

##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -23,134 +23,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Elastic Scaling
+# 弹性伸缩
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+在 Apache Flink 中，可以通过手动停止 Job，然后从停止时创建的 Savepoint 恢复，最后重新指定并行度的方式来重新伸缩 Job。
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+这个文档描述的特性是 Flink 如何自动地调整并行度。
 
-## Reactive Mode
+## Reactive 模式
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+Reactive 模式是一个 MVP （minimum viable product，最小可行产品）特性。目前 Flink 
社区正在积极地从邮件列表中获取用户的使用反馈。请注意文中列举的一些限制。
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+在 Reactive 模式下，Job 会使用集群中所有的资源。当增加 TaskManager 时，Job 会自动扩容。当删除时，就会自动缩容。Flink 
会管理 Job 的并行度，始终会尽可能地使用最大值。
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+当发生伸缩时，Job 会被重启，并且会从最新的 Checkpoint 中恢复。这就意味着不需要花费额外的开销去创建 
Savepoint。当然，所需要重新处理的数据量取决于 Checkpoint 的间隔时长，而恢复的时间取决于状态的大小。
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
- 
-### Getting started
+借助 Reactive 模式，Flink 用户可以通过一些外部的监控服务产生的指标，例如：消费延迟、CPU 
利用率汇总、吞吐量、延迟等，实现一个强大的自动伸缩机制。当上述的这些指标超出或者低于一定的阈值时，增加或者减少 TaskManager 的数量。在 
Kubernetes 中，可以通过改变 Deployment 的[副本数（Replica 
Factor）](https://kubernetes.io/zh/docs/concepts/workloads/controllers/deployment/#replicas)
 实现。而在 AWS 中，可以通过改变 [Auto Scaling 
组](https://docs.aws.amazon.com/zh_cn/autoscaling/ec2/userguide/AutoScalingGroup.html)
 来实现。这类外部服务只需要负责资源的分配以及回收，而 Flink 则负责在这些资源上运行 Job。
 
-If you just want to try out Reactive Mode, follow these instructions. They 
assume that you are deploying Flink on a single machine.
+<a name="getting-started"></a>
+
+### 入门
+
+你可以参考下面的步骤试用 Reactive 模式。以下步骤假设你使用的是单台机器部署 Flink。
 
 ```bash
 
-# these instructions assume you are in the root directory of a Flink 
distribution.
+# 以下步骤假设你当前目录处于 Flink 发行版的根目录。
 
-# Put Job into lib/ directory
+# 将 Job 拷贝到 lib/ 目录下
 cp ./examples/streaming/TopSpeedWindowing.jar lib/
-# Submit Job in Reactive Mode
+# 使用 Reactive 模式提交 Job
 ./bin/standalone-job.sh start -Dscheduler-mode=reactive 
-Dexecution.checkpointing.interval="10s" -j 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
-# Start first TaskManager
+# 启动第一个 TaskManager
 ./bin/taskmanager.sh start
 ```
 
-Let's quickly examine the used submission command:
-- `./bin/standalone-job.sh start` deploys Flink in [Application Mode]({{< ref 
"docs/deployment/overview" >}}#application-mode)
-- `-Dscheduler-mode=reactive` enables Reactive Mode.
-- `-Dexecution.checkpointing.interval="10s"` configure checkpointing and 
restart strategy.
-- the last argument is passing the Job's main class name.
+让我们快速解释下上面每一条执行的命令：
+- `./bin/standalone-job.sh start` 使用 [Application 模式]({{< ref 
"docs/deployment/overview" >}}#application-mode) 部署 Flink。
+- `-Dscheduler-mode=reactive` 启动 Reactive 模式。
+- `-Dexecution.checkpointing.interval="10s"` 配置 Checkpoint 和重启策略。
+- 最后一个参数是 Job 的主函数名。
 
-You have now started a Flink job in Reactive Mode. The [web 
interface](http://localhost:8081) shows that the job is running on one 
TaskManager. If you want to scale up the job, simply add another TaskManager to 
the cluster:
+你现在已经启动了一个 Reactive 模式下的 Flink Job。在[Web 界面](http://localhost:8081)上，你可以看到 Job 
运行在一个 TaskManager 上。如果你想要扩容，可以再添加一个 TaskManager，
 ```bash
-# Start additional TaskManager
+# 额外启动一个 TaskManager
 ./bin/taskmanager.sh start
 ```
 
-To scale down, remove a TaskManager instance.
+如果想要缩容，可以关掉一个 TaskManager。
 ```bash
-# Remove a TaskManager
+# 关闭 TaskManager
 ./bin/taskmanager.sh stop
 ```
 
-### Usage
-
-#### Configuration
+### 用法
 
-To enable Reactive Mode, you need to configure `scheduler-mode` to `reactive`.
+#### 配置
 
-The **parallelism of individual operators in a job will be determined by the 
scheduler**. It is not configurable
-and will be ignored if explicitly set, either on individual operators or the 
entire job.
+通过将 `scheduler-mode` 配置成 `reactive`，你可以开启 Reactive 模式。
 
-The only way of influencing the parallelism is by setting a max parallelism 
for an operator
-(which will be respected by the scheduler). The maxParallelism is bounded by 
2^15 (32768).
-If you do not set a max parallelism for individual operators or the entire 
job, the
-[default parallelism rules]({{< ref "docs/dev/datastream/execution/parallel" 
>}}#setting-the-maximum-parallelism) will be applied,
-potentially applying lower bounds than the max possible value. As with the 
default scheduling mode, please take
-the [best practices for parallelism]({{< ref "docs/ops/production_ready" 
>}}#set-an-explicit-max-parallelism) into consideration.
+每个独立算子的并行度都将由调度器来决定，而不是由配置决定。当并行度在算子上或者整个 Job 上被显式设置时，这些值被会忽略。

Review comment:
       Strong style is missing?

##########
File path: docs/content.zh/docs/deployment/elastic_scaling.md
##########
@@ -23,134 +23,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Elastic Scaling
+# 弹性伸缩
 
-Apache Flink allows you to rescale your jobs. You can do this manually by 
stopping the job and restarting from the savepoint created during shutdown with 
a different parallelism.
+在 Apache Flink 中，可以通过手动停止 Job，然后从停止时创建的 Savepoint 恢复，最后重新指定并行度的方式来重新伸缩 Job。
 
-This page describes options where Flink automatically adjusts the parallelism 
instead.
+这个文档描述的特性是 Flink 如何自动地调整并行度。
 
-## Reactive Mode
+## Reactive 模式
 
 {{< hint info >}}
-Reactive mode is an MVP ("minimum viable product") feature. The Flink 
community is actively looking for feedback by users through our mailing lists. 
Please check the limitations listed on this page.
+Reactive 模式是一个 MVP （minimum viable product，最小可行产品）特性。目前 Flink 
社区正在积极地从邮件列表中获取用户的使用反馈。请注意文中列举的一些限制。
 {{< /hint >}}
 
-Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
+在 Reactive 模式下，Job 会使用集群中所有的资源。当增加 TaskManager 时，Job 会自动扩容。当删除时，就会自动缩容。Flink 
会管理 Job 的并行度，始终会尽可能地使用最大值。
 
-Reactive Mode restarts a job on a rescaling event, restoring it from the 
latest completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size. 
+当发生伸缩时，Job 会被重启，并且会从最新的 Checkpoint 中恢复。这就意味着不需要花费额外的开销去创建 
Savepoint。当然，所需要重新处理的数据量取决于 Checkpoint 的间隔时长，而恢复的时间取决于状态的大小。
 
-The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster. This could be implemented 
through changing the [replica 
factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas)
 of a Kubernetes deployment, or an [autoscaling 
group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html)
 on AWS. This external service only needs to handle the resource allocation and 
deallocation. Flink will take care of keeping the job running with the 
resources available.
- 
-### Getting started
+借助 Reactive 模式，Flink 用户可以通过一些外部的监控服务产生的指标，例如：消费延迟、CPU 
利用率汇总、吞吐量、延迟等，实现一个强大的自动伸缩机制。当上述的这些指标超出或者低于一定的阈值时，增加或者减少 TaskManager 的数量。在 
Kubernetes 中，可以通过改变 Deployment 的[副本数（Replica 
Factor）](https://kubernetes.io/zh/docs/concepts/workloads/controllers/deployment/#replicas)
 实现。而在 AWS 中，可以通过改变 [Auto Scaling 
组](https://docs.aws.amazon.com/zh_cn/autoscaling/ec2/userguide/AutoScalingGroup.html)
 来实现。这类外部服务只需要负责资源的分配以及回收，而 Flink 则负责在这些资源上运行 Job。
 
-If you just want to try out Reactive Mode, follow these instructions. They 
assume that you are deploying Flink on a single machine.
+<a name="getting-started"></a>
+
+### 入门
+
+你可以参考下面的步骤试用 Reactive 模式。以下步骤假设你使用的是单台机器部署 Flink。
 
 ```bash
 
-# these instructions assume you are in the root directory of a Flink 
distribution.
+# 以下步骤假设你当前目录处于 Flink 发行版的根目录。
 
-# Put Job into lib/ directory
+# 将 Job 拷贝到 lib/ 目录下
 cp ./examples/streaming/TopSpeedWindowing.jar lib/
-# Submit Job in Reactive Mode
+# 使用 Reactive 模式提交 Job
 ./bin/standalone-job.sh start -Dscheduler-mode=reactive 
-Dexecution.checkpointing.interval="10s" -j 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
-# Start first TaskManager
+# 启动第一个 TaskManager
 ./bin/taskmanager.sh start
 ```
 
-Let's quickly examine the used submission command:
-- `./bin/standalone-job.sh start` deploys Flink in [Application Mode]({{< ref 
"docs/deployment/overview" >}}#application-mode)
-- `-Dscheduler-mode=reactive` enables Reactive Mode.
-- `-Dexecution.checkpointing.interval="10s"` configure checkpointing and 
restart strategy.
-- the last argument is passing the Job's main class name.
+让我们快速解释下上面每一条执行的命令：
+- `./bin/standalone-job.sh start` 使用 [Application 模式]({{< ref 
"docs/deployment/overview" >}}#application-mode) 部署 Flink。
+- `-Dscheduler-mode=reactive` 启动 Reactive 模式。
+- `-Dexecution.checkpointing.interval="10s"` 配置 Checkpoint 和重启策略。
+- 最后一个参数是 Job 的主函数名。
 
-You have now started a Flink job in Reactive Mode. The [web 
interface](http://localhost:8081) shows that the job is running on one 
TaskManager. If you want to scale up the job, simply add another TaskManager to 
the cluster:
+你现在已经启动了一个 Reactive 模式下的 Flink Job。在[Web 界面](http://localhost:8081)上，你可以看到 Job 
运行在一个 TaskManager 上。如果你想要扩容，可以再添加一个 TaskManager，
 ```bash
-# Start additional TaskManager
+# 额外启动一个 TaskManager
 ./bin/taskmanager.sh start
 ```
 
-To scale down, remove a TaskManager instance.
+如果想要缩容，可以关掉一个 TaskManager。
 ```bash
-# Remove a TaskManager
+# 关闭 TaskManager
 ./bin/taskmanager.sh stop
 ```
 
-### Usage
-
-#### Configuration
+### 用法
 
-To enable Reactive Mode, you need to configure `scheduler-mode` to `reactive`.
+#### 配置
 
-The **parallelism of individual operators in a job will be determined by the 
scheduler**. It is not configurable
-and will be ignored if explicitly set, either on individual operators or the 
entire job.
+通过将 `scheduler-mode` 配置成 `reactive`，你可以开启 Reactive 模式。
 
-The only way of influencing the parallelism is by setting a max parallelism 
for an operator
-(which will be respected by the scheduler). The maxParallelism is bounded by 
2^15 (32768).
-If you do not set a max parallelism for individual operators or the entire 
job, the
-[default parallelism rules]({{< ref "docs/dev/datastream/execution/parallel" 
>}}#setting-the-maximum-parallelism) will be applied,
-potentially applying lower bounds than the max possible value. As with the 
default scheduling mode, please take
-the [best practices for parallelism]({{< ref "docs/ops/production_ready" 
>}}#set-an-explicit-max-parallelism) into consideration.
+每个独立算子的并行度都将由调度器来决定，而不是由配置决定。当并行度在算子上或者整个 Job 上被显式设置时，这些值被会忽略。
 
-Note that such a high max parallelism might affect performance of the job, 
since more internal structures are needed to maintain [some internal 
structures](https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html)
 of Flink.
+而唯一能影响并行度的方式只有通过设置算子的最大并行度（调度器不会忽略这个值）。
+最大并行度 maxParallelism 参数的值最大不能超过 2^15（32768）。如果你们没有给算子或者整个 Job 
设置最大并行度，会采用[默认的最大并行度规则]({{< ref "docs/dev/datastream/execution/parallel" 
>}}#setting-the-maximum-parallelism)。
+这个值很有可能会低于它的最大上限。当使用默认的调度模式时，请参考[并行度的最佳实践]({{< ref "docs/ops/production_ready" 
>}}#set-an-explicit-max-parallelism)。
 
-When enabling Reactive Mode, the 
[`jobmanager.adaptive-scheduler.resource-wait-timeout`]({{< ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-resource-wait-timeout)
 configuration key will default to `-1`. This means that the JobManager will 
run forever waiting for sufficient resources.
-If you want the JobManager to stop after a certain time without enough 
TaskManagers to run the job, configure 
`jobmanager.adaptive-scheduler.resource-wait-timeout`.
+需要注意的是，过大的并行度会影响 Job 的性能，因为 Flink 
为此需要维护更多的[内部结构](https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html)。
 
-With Reactive Mode enabled, the 
[`jobmanager.adaptive-scheduler.resource-stabilization-timeout`]({{< ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-resource-stabilization-timeout)
 configuration key will default to `0`: Flink will start running the job, as 
soon as there are sufficient resources available.
-In scenarios where TaskManagers are not connecting at the same time, but 
slowly one after another, this behavior leads to a job restart whenever a 
TaskManager connects. Increase this configuration value if you want to wait for 
the resources to stabilize before scheduling the job.
-Additionally, one can configure 
[`jobmanager.adaptive-scheduler.min-parallelism-increase`]({{< ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-min-parallelism-increase):
 This configuration option specifics the minimum amount of additional, 
aggregate parallelism increase before triggering a scale-up. For example if you 
have a job with a source (parallelism=2) and a sink (parallelism=2), the 
aggregate parallelism is 4. By default, the configuration key is set to 1, so 
any increase in the aggregate parallelism will trigger a restart.
+当开启 Reactive 模式时，[`jobmanager.adaptive-scheduler.resource-wait-timeout`]({{< 
ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-resource-wait-timeout)
 配置的默认值是 `-1`。这意味着，JobManager 会一直等待，直到拥有足够的资源。
+如果你想要 JobManager 在没有拿到足够的 TaskManager 的一段时间后关闭，可以配置这个参数。
 
-#### Recommendations
+当开启 Reactive 
模式时，[`jobmanager.adaptive-scheduler.resource-stabilization-timeout`]({{< ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-resource-stabilization-timeout)
 配置的默认值是 `0`：Flink 只要有足够的资源，就会启动 Job。
+在 TaskManager 一个一个而不是同时启动的情况下，会造成 Job 在每一个 TaskManager 启动时重启一次。当你希望等待资源稳定后再启动 
Job，那么可以增加这个配置的值。
+另外，你还可以配置 [`jobmanager.adaptive-scheduler.min-parallelism-increase`]({{< ref 
"docs/deployment/config">}}#jobmanager-adaptive-scheduler-min-parallelism-increase)：这个配置能够指定在扩容前需要满足的最小额外增加的并行总数。例如，你的
 Job 由并行度为 2 的 Source 和并行度为 2 的 Sink组成，并行总数为 4。这个配置的默认值是 `1`，所以任意并行总数的增加都会导致重启。
 
-- **Configure periodic checkpointing for stateful jobs**: Reactive mode 
restores from the latest completed checkpoint on a rescale event. If no 
periodic checkpointing is enabled, your program will lose its state. 
Checkpointing also configures a **restart strategy**. Reactive Mode will 
respect the configured restarting strategy: If no restarting strategy is 
configured, reactive mode will fail your job, instead of scaling it.
+#### 建议
 
-- Downscaling in Reactive Mode might cause longer stalls in your processing 
because Flink waits for the heartbeat between JobManager and the stopped 
TaskManager(s) to time out. You will see that your Flink job is stuck for 
roughly 50 seconds before redeploying your job with a lower parallelism.
+- **为有状态的 Job 配置周期性的 Checkpoint**：Reactive 模式在伸缩时通过最新完成的 Checkpoint 
恢复。如果没有配置周期性的 Checkpoint，你的程序会丢失状态。Checkpoint 
同时还配置了**重启策略**，Reactive会使用配置的重启策略：如果没有设置，Reactive 模式会让 Job 失败而不是运行伸缩。
 
-  The default timeout is configured to 50 seconds. Adjust the 
[`heartbeat.timeout`]({{< ref "docs/deployment/config">}}#heartbeat-timeout) 
configuration to a lower value, if your infrastructure permits this. Setting a 
low heartbeat timeout can lead to failures if a TaskManager fails to respond to 
a heartbeat, for example due to a network congestion or a long garbage 
collection pause. Note that the [`heartbeat.interval`]({{< ref 
"docs/deployment/config">}}#heartbeat-interval) always needs to be lower than 
the timeout.
+- 在 Ractive 模式下缩容可能会导致长时间的停顿，因为 Flink 需要等待 JobManager 和已经停止的 TaskManager 
间心跳超时。当你降低 Job 并行度时，你会发现 Job 会停顿大约 50 秒左右。
+  
+  这是由于默认的心跳超时时间是 50 秒。在你的基础设施允许的情况下，可以降低 [`heartbeat.timeout`]({{< ref 
"docs/deployment/config">}}#heartbeat-timeout) 的值。但是降低超时时间，会导致比如在网络拥堵或者 GC 
Pause 的时候，TaskManager 无法响应心跳。需要注意的是，[`heartbeat.interval`]({{< ref 
"docs/deployment/config">}}#heartbeat-interval) 配置需要低于超时时间。
 
+### 限制

Review comment:
       How about "局限性"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] link3280 commented on a change in pull request #16928: [FLINK-23899][docs-zh] Translate the "Elastic Scaling" page into Chinese

Reply via email to