[GitHub] [flink] liuzhuang2017 commented on a diff in pull request #20660: [FLINK-28998] Translate'Fine-Grained Resource Management' page into Chinese

GitBox Fri, 26 Aug 2022 19:26:01 -0700


liuzhuang2017 commented on code in PR #20660:
URL: https://github.com/apache/flink/pull/20660#discussion_r952051966



##########
docs/content.zh/docs/deployment/finegrained_resource.md:
##########
@@ -23,97 +23,86 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Fine-Grained Resource Management
 
-Apache Flink works hard to auto-derive sensible default resource requirements 
for all applications out of the box. 
-For users who wish to fine-tune their resource consumption, based on knowledge 
of their specific scenarios, Flink offers **fine-grained resource management**.
+# 细粒度资源管理
 
-This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+Apache Flink 努力为所有开箱即用的应用程序自动派生合理的默认资源需求。对于希望更精细化调节资源消耗的用户，基于对特定场景的了解，Flink 
提供了**细粒度资源管理**。
+本文介绍了细粒度资源管理的使用、适用场景以及工作原理。
 
 {{< hint warning >}}
-**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+**注意:** 本特性是当前的一个最简化产品(版本)的特性，它支持只在 DataStream API [DataStream API]({{< ref 
"docs/dev/datastream/overview" >}})中使用。
 {{< /hint >}}
 
-## Applicable Scenarios
+## 使用场景
 
-Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+可能从细粒度资源管理中受益的典型场景包括：
 
-  - Tasks have significantly different parallelisms.
+- Tasks 有显著不同的并行度的场景。
 
-  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+- 整个pipeline需要的资源太大了以致不能和单一的slot/task Manager相适应的场景。
 
-  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+- 批处理作业，其中不同stage的task所需的资源差异明显。
 
-An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+在它如何提高资源利用率 [How it improves resource 
efficiency](#how-it-improves-resource-efficiency)部分将会对细粒度资源管理为什么在以上使用场景中可以提高资源利用率作深入的讨论。
 
-## How it works
 
-As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
-task execution resources in a TaskManager are split into many slots.
-The slot is the basic unit of both resource scheduling and resource 
requirement in Flink's runtime.
+## 工作原理
 
+如Flink架构 [Flink Architecture]({{< ref "docs/concepts/flink-architecture" 
>}}#anatomy-of-a-flink-cluster)中描述,
+在一个TaskManager中,执行task时使用的资源被分割成许多个slots.
+slot既是资源调度的基本单元,又是flink运行时申请资源的基本单元.
 {{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
 
-With fine-grained resource management, the slots requests contain specific 
resource profiles, which users can specify.
-Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
-resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory, and Flink allocates *Slot 1* for it.
+对于细粒度资源管理,Slot资源请求包含用户指定的特定的资源配置文件。Flink会遵从这些用户指定的资源请求并从TaskManager可用的资源中动态地切分出精确匹配的slot。如上图所示，对于一个slot，0.25core和1G内存的资源申请，Flink为它分配一个slot。
 
 {{< hint info >}}
-Previously in Flink, the resource requirement only contained the required 
slots, without fine-grained resource
-profiles, namely **coarse-grained resource management**. The TaskManager had a 
fixed number of identical slots to fulfill those requirements.
+Flink之前的资源申请只包含必须指定的slots,但没有精细化的资源配置,这是一种粗粒度的资源管理.在这种管理方式下, 
TaskManager以固定相同的slots的个数的方式来满足资源需求。
 {{< /hint >}}
 
-For the resource requirement without a specified resource profile, Flink will 
automatically decide a resource profile.
-Currently, the resource profile of it is calculated from [TaskManager’s total 
resource]({{< ref "docs/deployment/memory/mem_setup_tm" >}})
-and [taskmanager.numberOfTaskSlots]({{< ref "docs/deployment/config" 
>}}#taskmanager-numberoftaskslots), just
-like in coarse-grained resource management. As shown above, the total resource 
of TaskManager is 1 Core and 4 GB memory and the number of task slots
-is set to 2, *Slot 2* is created with 0.5 Core and 2 GB memory for the 
requirement without a specified resource profile.
+对于没有指定资源配置的资源请求，Flink会自动决定资源配置。粗粒度资源管理当前被计算的资源来自TaskManager总资源[TaskManager’s 
total resource]({{< ref "docs/deployment/memory/mem_setup_tm" 
>}})和TaskManager的总slot数[taskmanager.numberOfTaskSlots]({{< ref 
"docs/deployment/config" >}}#taskmanager-numberoftaskslots)。
+如上所示，TaskManager的总资源是1Core和4G内存，task的slot数设置为2，*Slot 2* 
被创建，并申请0.5core和2G的内存而没有指定资源配置。
+在分配slot1和slot2后，在TaskManager留下0.25核和1G的内存作为未使用资源.
 
-After the allocation of *Slot 1* and *Slot 2*, there is 0.25 Core and 1 GB 
memory remaining as the free resources in the
-TaskManager. These free resources can be further partitioned to fulfill the 
following resource requirements.
+详情请参考资源分配策略 [Resource Allocation Strategy](#resource-allocation-strategy)。
 
-Please refer to [Resource Allocation Strategy](#resource-allocation-strategy) 
for more details.
 
-## Usage
+## 用法
 
-To use fine-grained resource management, you need to:
+为了可以使用细粒度的资源管理,需要做以下步骤:
 
-  - Configure to enable fine-grained resource management.
+- 配置细粒度的资源管理
 
-  - Specify the resource requirement.
+- 指定资源请求
 
-### Enable Fine-Grained Resource Management
-
-To enable fine-grained resource management, you need to configure the 
[cluster.fine-grained-resource-management.enabled]({{< ref 
"docs/deployment/config" >}}#cluster-fine-grained-resource-management-enabled) 
to true.
+### Enable 细粒度资源管理
 
+为了enable细粒度的资源管理配置,需要将[cluster.fine-grained-resource-management.enabled]的值设置为true({{<
 ref "docs/deployment/config" 
>}}#cluster-fine-grained-resource-management-enabled)。
 {{< hint danger >}}
-Without this configuration, the Flink runtime cannot schedule the slots with 
your specified resource requirement and the job will fail with an exception.
+没有该配置,Flink运行job时并不能按照你指定的资源需求分配slots,并且job会失败抛出异常。
 {{< /hint >}}
 
-### Specify Resource Requirement for Slot Sharing Group
-
-Fine-grained resource requirements are defined on slot sharing groups. A slot 
sharing group is a hint that tells the JobManager operators/tasks in it CAN be 
put into the same slot.
-
-For specifying the resource requirement, you need to:
+### 为Slot共享组指定资源请求
 
-  - Define the slot sharing group and the operators it contains.
+细粒度资源请求是基于slot共享组定义的。一个slot共享组是一个切入点，这意味着在TaskManager中的算子和tasks可以被置于相同的slot。
 
-  - Specify the resource of the slot sharing group.
+对于指定资源请求,应该:
 
-There are two approaches to define the slot sharing group and the operators it 
contains:
+- 定义Slot共享组和它所包含的操作算子

Review Comment:
   最好面缺少了冒号



##########
docs/content.zh/docs/deployment/finegrained_resource.md:
##########
@@ -232,93 +226,68 @@ ssg_with_resource = SlotSharingGroup.builder('ssg') \
             .build()
 
 # Build a slot sharing group without specific resource and then register the 
resource of it in StreamExecutionEnvironment
+# 构建一个 slot 共享组未指定资源，然后在 StreamExecutionEnvironment中注册资源
 ssg_with_name = SlotSharingGroup.builder('ssg').build()
 env.register_slot_sharing_group(ssg_with_resource)
 ```
 {{< /tab >}}
 {{< /tabs >}}
 
 {{< hint warning >}}
-**Note:** You can construct a SlotSharingGroup with or without specifying its 
resource profile.
-With specifying the resource profile, you need to explicitly set the **CPU 
cores** and **Task Heap Memory** with a positive value, other components are 
optional.
+**提示:** 可以指定或者不指定资源配置构造 SlotSharingGroup。
+对于指定资源配置，必须明确地将 **CPU cores*** 和 **Task Heap Memory** 设置成正数值，其它设置则是可选的。
 {{< /hint >}}
 
-## Limitations
-
-Since fine-grained resource management is a new, experimental feature, not all 
features supported by the default
-scheduler are also available with it. The Flink community is working on 
addressing these limitations.
-
-  - **No support for the [Elastic Scaling]({{< ref 
"docs/deployment/elastic_scaling" >}})**. The elastic scaling only supports 
slot requests without specified-resource at the moment.
-
-  - **No support for task manager redundancy**. The 
[slotmanager.redundant-taskmanager-num]({{< ref "docs/deployment/config" 
>}}#slotmanager-redundant-taskmanager-num) is used to start redundant 
TaskManagers to speed up job recovery. This config option will not take effect 
in fine-grained resource management at the moment.
-
-  - **No support for evenly spread out slot strategy**. This strategy tries to 
spread out the slots evenly across all available TaskManagers. The strategy is 
not supported in the first version of fine-grained resource management and 
[cluster.evenly-spread-out-slots]({{< ref "docs/deployment/config" 
>}}#cluster-evenly-spread-out-slots) will not take effect in it at the moment.
-
-  - **Limited integration with Flink’s Web UI**. Slots in fine-grained 
resource management can have different resource specs. The web UI only shows 
the slot number without its details at the moment.
-
-  - **Limited integration with batch jobs**. At the moment, fine-grained 
resource management requires batch workloads to be executed with types of all 
edges being BLOCKING. To do that, you need to configure 
[fine-grained.shuffle-mode.all-blocking]({{< ref "docs/deployment/config" 
>}}#fine-grained-shuffle-mode-all-blocking) to `true`. Notice that this may 
affect the performance. See 
[FLINK-20865](https://issues.apache.org/jira/browse/FLINK-20865) for more 
details.
+## 局限
 
-  - **Hybrid resource requirements are not recommended**. It is not 
recommended to specify the resource requirements only for some parts of the job 
and leave the requirements for the rest unspecified. Currently, the unspecified 
requirement can be fulfilled with slots of any resource. The actual resource 
acquired by it can be inconsistent across different job executions or failover.
+因为细粒度资源管理是新的实验性特性,并不是所有的特性都被默认的调度器所支持.Flink社区正努力解决并突破这些限制。
+- **不支持[弹性伸缩]({{< ref "docs/deployment/elastic_scaling" >}})**. 
弹性伸缩目前只支持不指定资源的slot请求。
+- **不支持TaskManager的冗余** TaskManager冗余 
[slotmanager.redundant-taskmanager-num]({{< ref "docs/deployment/config" 
>}}#slotmanager-redundant-taskmanager-num) 
用于启动冗余的TaskManager以加速job恢复。当前该配置在细粒度资源管理中不生效。
+- **不支持均匀分布的插槽策略** 此策略试图在所有可用的TaskManager中均匀分配插槽 
[cluster.evenly-spread-out-slots]({{< ref "docs/deployment/config" 
>}}#cluster-evenly-spread-out-slots)。该策略在细粒度资源管理的第一个版本中不受支持，目前不会生效。
+- **与Flink Web UI有限的集成** 在细粒度的资源管理中,Slots会有不同的资源规格.目前Web UI页面只显示 slot 
数量而不显示具体详情。
+- **与批作业有限的集成** 目前，细粒度资源管理需要在所有边缘都被阻塞的情况下执行批处理工作负载。为了达到该实现，需要将配置 
[fine-grained.shuffle-mode.all-blocking]({{< ref "docs/deployment/config" 
>}}#fine-grained-shuffle-mode-all-blocking)设置为true。注意这样可能会影响性能。详情请见[FLINK-20865](https://issues.apache.org/jira/browse/FLINK-20865)。
+- **不建议使用混合资源需求** 
不建议仅为工作的某些部分指定资源需求，而未指定其余部分的需求。目前，任何资源的插槽都可以满足未指定的要求。它获取的实际资源可能在不同的作业执行或故障切换中不一致。
+- **Slot分配结果可能不是最优** 正因为Slot需求包含资源的多维度方面,所以,Slot分配实际上是一个多维度问题,这是一个NP难题. 
因些,在一些使用场景中,默认的 
[资源分配策略](#resource-allocation-strategy)可能不会使得Slot分配达到最优,而且还会导致资源碎片或者资源分配失败.
 
-  - **Slot allocation result might not be optimal**. As the slot requirements 
contain multiple dimensions of resources, the slot allocation is indeed a 
multi-dimensional packing problem, which is NP-hard. The default [resource 
allocation strategy](#resource-allocation-strategy) might not achieve optimal 
slot allocation and can lead to resource fragments or resource allocation 
failure in some scenarios.
+## 注意
+- **设置 Slot 共享组可能改变性能** 为可链式操作的算子设置不同的slot共享组可能会导致链式操作 [operator chains]({{< 
ref "docs/dev/datastream/operators/overview" 
>}}#task-chaining-and-resource-groups)产生割裂,从而改变性能.
+- **Slot 共享组不会限制算子的调度** 
Slot共享组仅仅意味着调度器可以使被分组的算子被部署到中一个Slot中,但无法保证调度器总是和被分组的算子部署绑定在一起。如果被分组算子被部署到单独的Slot中，Slot资源将从特定的资源组需求中派生而来。
 
-## Notice
+## 深入讨论
 
-  - **Setting the slot sharing group may change the performance**. Setting 
chain-able operators to different slot sharing groups may break [operator 
chains]({{< ref "docs/dev/datastream/operators/overview" 
>}}#task-chaining-and-resource-groups), and thus change the performance.
+### 如何提高资源使用率
 
-  - **Slot sharing group will not restrict the scheduling of operators**. The 
slot sharing group only hints the scheduler that the grouped operators CAN be 
deployed into a shared slot. There's no guarantee that the scheduler always 
deploys the grouped operator together. In cases grouped operators are deployed 
into separate slots, the slot resources will be derived from the specified 
group requirement.
+这部分，我们对细粒度资源管理如何提高资源利用率作深入讨论，这会有助于你理解它对我们的 jobs 是否有益。
+之前的 Flink 采用的一种粗粒度资源管理的方式，tasks 被提前定义部署，通常被分配相同的 slots 而没有每个 slot 包含多少资源的概念。
+对于许多jobs，使用粗粒度的资源管理并简单地把所有的tasks放入一个 [Slot共享组]({{< ref 
"docs/dev/datastream/operators/overview" 
>}}#set-slot-sharing-group)中运行，就资源利用率而言，也能运行得很好。
 
-## Deep Dive
+- 对于许多有相同并行度的 tasks 的流作业而言，每个 slot 
会包含[整个pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)。理想情况条件下，所有的
 pipelines 应该使用大致相同的资源，这可以容易被满足通过调节相同 slot 的资源。
 
-### How it improves resource efficiency
+- 
tasks的资源消耗随时间变化不同。当一个task的资源消耗减少，其他的资源可以被另外一个task使用，该task的消耗增加。这就是被称为“调峰填谷效应”的现象，它降低了所需要的总体需求。
 
-In this section, we deep dive into how fine-grained resource management 
improves resource efficiency, which can help you to understand whether it can 
benefit your jobs.
+尽管如此，有些情况下使用粗粒度资源管理效果并不好。
 
-Previously, Flink adopted a coarse-grained resource management approach, where 
tasks are deployed into predefined,
-usually identical slots without the notion of how many resources each slot 
contains. For many jobs, using coarse-grained
-resource management and simply putting all tasks into one [slot sharing 
group]({{< ref "docs/dev/datastream/operators/overview" 
>}}#set-slot-sharing-group) works well enough in terms of resource utilization.
+- Tasks会有不同的并行度。有时，这种不同的并行度是不可避免的。例如，象 source/sink/lookup 
这些类别的tasks的并行度可能被分区数和外部上下游系统的 IO 负载所限制。在这种情况下，拥有更少的tasks的slots会需要更少的资源相比tasks的 
[整个pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)。
+- 
有时[整个pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)
 需要的资源可能会太大以致难于与单一的 slot/TaskManager 
的场景相适应。在这种情况下，pipeline需要被分割成多个SSGs，它们可能不总是有相同的资源需求。
+- 对于批作业，不是所有的 tasks 能够在同时被执行。因此，整个 pipeline 的瞬时资源需求而时间变化。
 
-  - For many streaming jobs that all tasks have the same parallelism, each 
slot will contain an [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions).
 Ideally, all pipelines should use roughly the same resources, which can be 
satisfied easily by tuning the resources of the identical slots.
+试图以相同的slots执行所有的tasks,这样会造成非最优的资源利用率。相同 slot 
的资源能够满足最高的资源需求，这对于其他资源需求将是浪费的。当涉及到像GPU这样昂贵的外部资源时，这样的浪费将是难以承受的。细粒度资源管理运用不同资源的slots提高了资源利用率在种使用场景中。
 
-  - Resource consumption of tasks varies over time. When consumption of a task 
decreases, the extra resources can be used by another task whose consumption is 
increasing. This, known as the peak shaving and valley filling effect, reduces 
the overall resource needed.
-
-However, there are cases where coarse-grained resource management does not 
work well.
-
-  - Tasks may have different parallelisms. Sometimes, such different 
parallelisms cannot be avoided. E.g., the parallelism of source/sink/lookup 
tasks might be constrained by the partitions and IO load of the external 
upstream/downstream system. In such cases, slots with fewer tasks would need 
fewer resources than those with the [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)
 of tasks.
-
-  - Sometimes the resource needed for the [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)
 might be too much to be put into a single slot/TaskManager. In such cases, the 
pipeline needs to be split into multiple SSGs, which may not always have the 
same resource requirement.
-
-  - For batch jobs, not all the tasks can be executed at the same time. Thus, 
the instantaneous resource requirement of the pipeline changes over time.
-
-Trying to execute all tasks with identical slots can result in non-optimal 
resource utilization. The resource of the identical slots
-has to be able to fulfill the highest resource requirement, which will be 
wasteful for other requirements. When expensive external resources
-like GPU are involved, such waste can become even harder to afford. The 
fine-grained resource management leverages slots of different resources
-to improve resource utilization in such scenarios.
-
-### Resource Allocation Strategy
-
-In this section, we talk about the slot partitioning mechanism in Flink 
runtime and the resource allocation strategy, including how
-the Flink runtime selects a TaskManager to cut slots and allocates 
TaskManagers on [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})
-and [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}). Note that 
the resource allocation strategy is pluggable in
-Flink runtime and here we introduce its default implementation in the first 
step of fine-grained resource
-management. In the future, there might be various strategies that users can 
select for different scenarios.
+### 资源分配策略
 
+本节讨论的是Flink作业运行时的Slot分区机制和资源分配策略,包括在 YARN 和 Kubernetes 中运行 Flink 作业时,Flink 
如何选择 TaskManager 来切分成 Slots 和如何分配 TaskManager 的.({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})

Review Comment:
   这里缺少了 [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})，还有 Flink Slot 
前后需要空格。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] liuzhuang2017 commented on a diff in pull request #20660: [FLINK-28998] Translate'Fine-Grained Resource Management' page into Chinese

Reply via email to