Re: [PR] [FLINK-33445][docs-zh] Translate DataSet migration guideline to Chinese [flink]

via GitHub Sun, 12 Nov 2023 19:24:49 -0800


WencongLiu commented on code in PR #23666:
URL: https://github.com/apache/flink/pull/23666#discussion_r1390574350



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -134,10 +131,9 @@ DataStreamSource<> source = 
StreamExecutionEnvironment.createInput(inputFormat)
     </tbody>
 </table>
 
-### Sinks
+### 写

Review Comment:
   Keep it to "Sinks".



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -62,44 +58,45 @@ The first step of migrating an application from DataSet API 
to DataStream API is
         <tr>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 ExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 ExecutionEnvironment.createLocalEnvironment();
-// Create the collection environment
+// 创建 collection 环境
 new CollectionEnvironment();
-// Create the remote environment
+// 创建远程执行环境
 ExecutionEnvironment.createRemoteEnvironment(String host, int port, String... 
jarFiles);
                 {{< /highlight >}}
             </td>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 StreamExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 StreamExecutionEnvironment.createLocalEnvironment();
-// The collection environment is not supported.
-// Create the remote environment
+// 不支持 collection 环境
+// 创建远程执行环境
 StreamExecutionEnvironment.createRemoteEnvironment(String host, int port, 
String... jarFiles);
                 {{< /highlight >}}
             </td>
         </tr>
     </tbody>
 </table>
 
-Unlike DataSet, DataStream supports processing on both bounded and unbounded 
data streams. Thus, user needs to explicitly set the execution mode
-to `RuntimeExecutionMode.BATCH` if that is expected.
+与 DataSet 不同，DataStream 支持对有界和无界数据流进行处理。
+
+如果需要的话，用户可以显式地将执行模式设置为 `RuntimeExecutionMode.BATCH`。
 
 ```java
 StreamExecutionEnvironment executionEnvironment = // [...];
 executionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
 ```
 
-## Using the streaming sources and sinks
+## 流读和流写

Review Comment:
   I think a better expression is “设置streaming类型的Source和Sink”.



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -62,44 +58,45 @@ The first step of migrating an application from DataSet API 
to DataStream API is
         <tr>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 ExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 ExecutionEnvironment.createLocalEnvironment();
-// Create the collection environment
+// 创建 collection 环境
 new CollectionEnvironment();
-// Create the remote environment
+// 创建远程执行环境
 ExecutionEnvironment.createRemoteEnvironment(String host, int port, String... 
jarFiles);
                 {{< /highlight >}}
             </td>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 StreamExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 StreamExecutionEnvironment.createLocalEnvironment();
-// The collection environment is not supported.
-// Create the remote environment
+// 不支持 collection 环境
+// 创建远程执行环境
 StreamExecutionEnvironment.createRemoteEnvironment(String host, int port, 
String... jarFiles);
                 {{< /highlight >}}
             </td>
         </tr>
     </tbody>
 </table>
 
-Unlike DataSet, DataStream supports processing on both bounded and unbounded 
data streams. Thus, user needs to explicitly set the execution mode
-to `RuntimeExecutionMode.BATCH` if that is expected.
+与 DataSet 不同，DataStream 支持对有界和无界数据流进行处理。
+
+如果需要的话，用户可以显式地将执行模式设置为 `RuntimeExecutionMode.BATCH`。
 
 ```java
 StreamExecutionEnvironment executionEnvironment = // [...];
 executionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
 ```
 
-## Using the streaming sources and sinks
+## 流读和流写
 
-### Sources
+### 读

Review Comment:
   Keep it to "Sources".



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -62,44 +58,45 @@ The first step of migrating an application from DataSet API 
to DataStream API is
         <tr>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 ExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 ExecutionEnvironment.createLocalEnvironment();
-// Create the collection environment
+// 创建 collection 环境
 new CollectionEnvironment();
-// Create the remote environment
+// 创建远程执行环境
 ExecutionEnvironment.createRemoteEnvironment(String host, int port, String... 
jarFiles);
                 {{< /highlight >}}
             </td>
             <td>
                 {{< highlight "java" >}}
-// Create the execution environment
+// 创建执行环境
 StreamExecutionEnvironment.getExecutionEnvironment();
-// Create the local execution environment
+// 创建本地执行环境
 StreamExecutionEnvironment.createLocalEnvironment();
-// The collection environment is not supported.
-// Create the remote environment
+// 不支持 collection 环境
+// 创建远程执行环境
 StreamExecutionEnvironment.createRemoteEnvironment(String host, int port, 
String... jarFiles);
                 {{< /highlight >}}
             </td>
         </tr>
     </tbody>
 </table>
 
-Unlike DataSet, DataStream supports processing on both bounded and unbounded 
data streams. Thus, user needs to explicitly set the execution mode
-to `RuntimeExecutionMode.BATCH` if that is expected.
+与 DataSet 不同，DataStream 支持对有界和无界数据流进行处理。
+
+如果需要的话，用户可以显式地将执行模式设置为 `RuntimeExecutionMode.BATCH`。
 
 ```java
 StreamExecutionEnvironment executionEnvironment = // [...];
 executionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
 ```
 
-## Using the streaming sources and sinks
+## 流读和流写

Review Comment:
   I think a better expression is “设置streaming类型的Source和Sink”.



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -679,19 +673,19 @@ dataStream3.join(dataStream4)
     </tbody>
 </table>
 
-### Category 4
+### 第四类
 
-The behaviors of the following DataSet APIs are not supported by DataStream.
+以下 DataSet API 的行为不被 DataStream 支持。
 
 * RangePartition
 * GroupCombine
 
 
-## Appendix
+## 附录
 
 #### EndOfStreamWindows
 
-The following code shows the example of `EndOfStreamWindows`.
+以下代码显示了 `EndOfStreamWindows` 示例实现。

Review Comment:
   "显示" -> "展示".



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -336,13 +332,12 @@ dataStream.keyBy(value -> value.f0)
     </tbody>
 </table>
 
-### Category 2
+### 第二类
 
-For category 2, the behavior of these DataSet APIs can be achieved by other 
APIs with different semantics in DataStream, which might require some code 
changes for
-migration but will result in the same execution efficiency. 
+对于第二类，这些 DataSet API 的行为可以通过 DataStream 中具有不同语义的其他 API 
来实现，这可能需要更改一些代码来进行迁移，但仍保持相同的执行效率。
 
-Operations on a full DataSet correspond to the global window aggregation in 
DataStream with a custom window that is triggered at the end of the inputs. The 
`EndOfStreamWindows`
-in the [Appendix]({{< ref 
"docs/dev/datastream/dataset_migration#endofstreamwindows" >}}) shows how such 
a window can be implemented. We will reuse it in the rest of this document.
+对整个 DataSet 的操作相当于 DataStream 中自定义的在输入结束时触发的全局窗口聚合。

Review Comment:
   
"DataSet中存在对整个DataSet进行操作的API。这些API在DataStream中可以用一个全局窗口来实现，该全局窗口只会在输入数据结束时触发窗口内数据的计算。"



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -601,13 +596,12 @@ dataSet1.rightOuterJoin(dataSet2)
     </tbody>
 </table>
 
-### Category 3
+### 第三类
 
-For category 3, the behavior of these DataSet APIs can be achieved by other 
APIs with different semantics in DataStream, with potentially additional cost 
in execution efficiency.
+对于第三类，这些 DataSet API 的行为可以通过 DataStream 中具有不同语义的其他 API 来实现，但可能会增加额外的执行效率成本。
 
-Currently, DataStream API does not directly support aggregations on non-keyed 
streams (subtask-scope aggregations). In order to do so, we need to first 
assign the subtask id 
-to the records, then turn the stream into a keyed stream. The 
`AddSubtaskIdMapFunction` in the [Appendix]({{< ref 
"docs/dev/datastream/dataset_migration#addsubtaskidmapfunction" >}}) shows how 
-to do that, and we will reuse it in the rest of this document.
+目前，DataStream API 不直接支持 non-keyed 流上的聚合（子任务范围聚合）。为此，我们需要首先将子任务 ID 
分配给记录，然后将流转换为 keyed 流。

Review Comment:
   "（子任务范围聚合）" -> "（对subtask内的数据进行聚合）"
   "子任务 ID" -> "subtask ID"



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -679,19 +673,19 @@ dataStream3.join(dataStream4)
     </tbody>
 </table>
 
-### Category 4
+### 第四类
 
-The behaviors of the following DataSet APIs are not supported by DataStream.
+以下 DataSet API 的行为不被 DataStream 支持。
 
 * RangePartition
 * GroupCombine
 
 
-## Appendix
+## 附录
 
 #### EndOfStreamWindows
 
-The following code shows the example of `EndOfStreamWindows`.
+以下代码显示了 `EndOfStreamWindows` 示例实现。

Review Comment:
   "显示" -> "展示".



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -601,13 +596,12 @@ dataSet1.rightOuterJoin(dataSet2)
     </tbody>
 </table>
 
-### Category 3
+### 第三类
 
-For category 3, the behavior of these DataSet APIs can be achieved by other 
APIs with different semantics in DataStream, with potentially additional cost 
in execution efficiency.
+对于第三类，这些 DataSet API 的行为可以通过 DataStream 中具有不同语义的其他 API 来实现，但可能会增加额外的执行效率成本。
 
-Currently, DataStream API does not directly support aggregations on non-keyed 
streams (subtask-scope aggregations). In order to do so, we need to first 
assign the subtask id 
-to the records, then turn the stream into a keyed stream. The 
`AddSubtaskIdMapFunction` in the [Appendix]({{< ref 
"docs/dev/datastream/dataset_migration#addsubtaskidmapfunction" >}}) shows how 
-to do that, and we will reuse it in the rest of this document.
+目前，DataStream API 不直接支持 non-keyed 流上的聚合（子任务范围聚合）。为此，我们需要首先将子任务 ID 
分配给记录，然后将流转换为 keyed 流。

Review Comment:
   "（子任务范围聚合）" -> "（对subtask内的数据进行聚合）"
   "子任务 ID" -> "subtask ID"



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -336,13 +332,12 @@ dataStream.keyBy(value -> value.f0)
     </tbody>
 </table>
 
-### Category 2
+### 第二类
 
-For category 2, the behavior of these DataSet APIs can be achieved by other 
APIs with different semantics in DataStream, which might require some code 
changes for
-migration but will result in the same execution efficiency. 
+对于第二类，这些 DataSet API 的行为可以通过 DataStream 中具有不同语义的其他 API 
来实现，这可能需要更改一些代码来进行迁移，但仍保持相同的执行效率。
 
-Operations on a full DataSet correspond to the global window aggregation in 
DataStream with a custom window that is triggered at the end of the inputs. The 
`EndOfStreamWindows`
-in the [Appendix]({{< ref 
"docs/dev/datastream/dataset_migration#endofstreamwindows" >}}) shows how such 
a window can be implemented. We will reuse it in the rest of this document.
+对整个 DataSet 的操作相当于 DataStream 中自定义的在输入结束时触发的全局窗口聚合。

Review Comment:
   
"DataSet中存在对整个DataSet进行操作的API。这些API在DataStream中可以用一个全局窗口来实现，该全局窗口只会在输入数据结束时触发窗口内数据的计算。"



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -761,7 +755,8 @@ public class EndOfStreamWindows extends 
WindowAssigner<Object, TimeWindow> {
 
 #### AddSubtaskIDMapFunction
 
-The following code shows the example of `AddSubtaskIDMapFunction`.
+以下代码显示了 `AddSubtaskIDMapFunction` 示例实现。

Review Comment:
   "显示" -> "展示".



##########
docs/content.zh/docs/dev/datastream/dataset_migration.md:
##########
@@ -761,7 +755,8 @@ public class EndOfStreamWindows extends 
WindowAssigner<Object, TimeWindow> {
 
 #### AddSubtaskIDMapFunction
 
-The following code shows the example of `AddSubtaskIDMapFunction`.
+以下代码显示了 `AddSubtaskIDMapFunction` 示例实现。

Review Comment:
   "显示" -> "展示".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-33445][docs-zh] Translate DataSet migration guideline to Chinese [flink]

Reply via email to