liying919 commented on a change in pull request #12012:
URL: https://github.com/apache/flink/pull/12012#discussion_r421435182
##########
File path: docs/training/etl.zh.md
##########
@@ -131,36 +117,27 @@ public static class NYCEnrichment implements
FlatMapFunction<TaxiRide, EnrichedR
}
{% endhighlight %}
-With the `Collector` provided in this interface, the `flatmap()` method can
emit as many stream
-elements as you like, including none at all.
+使用接口中提供的 `Collector` ,`flatmap()` 可以发射你想要的任意数量的元素,也可以一个都不发。
{% top %}
## Keyed Streams
### `keyBy()`
-It is often very useful to be able to partition a stream around one of its
attributes, so that all
-events with the same value of that attribute are grouped together. For
example, suppose you wanted
-to find the longest taxi rides starting in each of the grid cells. Thinking in
terms of a SQL query,
-this would mean doing some sort of GROUP BY with the `startCell`, while in
Flink this is done with
-`keyBy(KeySelector)`
+将一个流根据其中的一些属性来进行分区是十分有用的,这样我们可以使所有具有相同属性的事件分到相同的组里。例如,如果你想找到从每个网格单元出发的最远的出租车行程。按
SQL 查询的方式来考虑,这意味着要对 `startCell` 进行 GROUP BY 再排序,在 Flink 中这部分可以用
`keyBy(KeySelector)` 实现。
{% highlight java %}
rides
.flatMap(new NYCEnrichment())
.keyBy("startCell")
{% endhighlight %}
-Every `keyBy` causes a network shuffle that repartitions the stream. In
general this is pretty
-expensive, since it involves network communication along with serialization
and deserialization.
+每个 `keyBy` 会通过 shuffle 来为数据流进行重新分区。总体来说这个开销是很大的,它涉及网络通信、序列化和反序列化。
<img src="{{ site.baseurl }}/fig/keyBy.png" alt="keyBy and network shuffle"
class="offset" width="45%" />
-In the example above, the key has been specified by a field name, "startCell".
This style of key
-selection has the drawback that the compiler is unable to infer the type of
the field being used for
-keying, and so Flink will pass around the key values as Tuples, which can be
awkward. It is
-better to use a properly typed KeySelector, e.g.,
+在上面的例子中,将 "startCell" 这个字段定义为key。这种选择key的方式有个缺点,就是编译器无法推断用作键的字段的类型,所以 Flink
会将键值作为元组传递,这有时候会比较难处理。所以最好还是使用一个合适的 KeySelector,
Review comment:
此处我把key也按规范统一翻译为“键”吧:
```suggestion
在上面的例子中,将 "startCell" 这个字段定义为键。这种选择键的方式有个缺点,就是编译器无法推断用作键的字段的类型,所以 Flink
会将键值作为元组传递,这有时候会比较难处理。所以最好还是使用一个合适的 KeySelector,比如:
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]