YngwieWang commented on a change in pull request #9299: [FLINK-13405][docs-zh]
Translate "Basic API Concepts" page into Chinese
URL: https://github.com/apache/flink/pull/9299#discussion_r317599400
##########
File path: docs/dev/api_concepts.zh.md
##########
@@ -739,164 +646,125 @@ class WordWithCount(var word: String, var count: Int) {
val input = env.fromElements(
new WordWithCount("hello", 1),
- new WordWithCount("world", 2)) // Case Class Data Set
+ new WordWithCount("world", 2)) // Case Class 数据集
-input.keyBy("word")// key by field expression "word"
+input.keyBy("word")// 以字段表达式“word”为键
{% endhighlight %}
</div>
</div>
-#### Primitive Types
+#### 基本数据类型
-Flink supports all Java and Scala primitive types such as `Integer`, `String`,
and `Double`.
+Flink 支持所有 Java 和 Scala 的基本数据类型如 `Integer`、 `String`、和 `Double`。
-#### General Class Types
+#### 常规的类
-Flink supports most Java and Scala classes (API and custom).
-Restrictions apply to classes containing fields that cannot be serialized,
like file pointers, I/O streams, or other native
-resources. Classes that follow the Java Beans conventions work well in general.
+Flink 支持大部分 Java 和 Scala 的类(API 和自定义)。
+除了包含无法序列化的字段的类,如文件指针,I / O流或其他本地资源。遵循 Java Beans 约定的类通常可以很好地工作。
-All classes that are not identified as POJO types (see POJO requirements
above) are handled by Flink as general class types.
-Flink treats these data types as black boxes and is not able to access their
content (i.e., for efficient sorting). General types are de/serialized using
the serialization framework [Kryo](https://github.com/EsotericSoftware/kryo).
+Flink 对于所有未识别为 POJO 类型的类(请参阅上面对于的 POJO 要求)都作为常规类处理。
+Flink 将这些数据类型视为黑盒,并且无法访问其内容(为了诸如高效排序等目的)。常规类使用
[Kryo](https://github.com/EsotericSoftware/kryo) 序列化框架进行序列化和反序列化。
-#### Values
+#### 值
-*Value* types describe their serialization and deserialization manually.
Instead of going through a
-general purpose serialization framework, they provide custom code for those
operations by means of
-implementing the `org.apache.flinktypes.Value` interface with the methods
`read` and `write`. Using
-a Value type is reasonable when general purpose serialization would be highly
inefficient. An
-example would be a data type that implements a sparse vector of elements as an
array. Knowing that
-the array is mostly zero, one can use a special encoding for the non-zero
elements, while the
-general purpose serialization would simply write all array elements.
+*值* 类型手工描述其序列化和反序列化。它们不是通过通用序列化框架,而是通过使用 `read` 和 `write` 方法实现
`org.apache.flinktypes.Value`
接口来为这些操作提供自定义编码。当通用序列化效率非常低时,使用值类型是合理的。例如,用数组实现稀疏向量。已知数组大部分元素为零,就可以对非零元素使用特殊编码,而通用序列化只会简单地将所有数组元素都写入。
-The `org.apache.flinktypes.CopyableValue` interface supports manual internal
cloning logic in a
-similar way.
+`org.apache.flinktypes.CopyableValue` 接口以类似的方式支持内部手工克隆逻辑。
-Flink comes with pre-defined Value types that correspond to basic data types.
(`ByteValue`,
-`ShortValue`, `IntValue`, `LongValue`, `FloatValue`, `DoubleValue`,
`StringValue`, `CharValue`,
-`BooleanValue`). These Value types act as mutable variants of the basic data
types: Their value can
-be altered, allowing programmers to reuse objects and take pressure off the
garbage collector.
+Flink 有与基本数据类型对应的预定义值类型。(`ByteValue`、
+`ShortValue`、 `IntValue`、`LongValue`、 `FloatValue`、`DoubleValue`、
`StringValue`、`CharValue`、
+`BooleanValue`)。这些值类型充当基本数据类型的可变变体:它们的值可以改变,允许程序员重用对象并减轻垃圾回收器的压力。
-#### Hadoop Writables
+#### Hadoop Writable
-You can use types that implement the `org.apache.hadoop.Writable` interface.
The serialization logic
-defined in the `write()`and `readFields()` methods will be used for
serialization.
+可以使用实现了 `org.apache.hadoop.Writable` 接口的类型。它们会使用 `write()` 和 `readFields()`
方法中定义的序列化逻辑。
-#### Special Types
+#### 特殊类型
-You can use special types, including Scala's `Either`, `Option`, and `Try`.
-The Java API has its own custom implementation of `Either`.
-Similarly to Scala's `Either`, it represents a value of two possible types,
*Left* or *Right*.
-`Either` can be useful for error handling or operators that need to output two
different types of records.
+可以使用特殊类型,包括 Scala 的 `Either`、`Option` 和 `Try`。
+Java API 有对 `Either` 的自定义实现。
+类似于 Scala 的 `Either`,它表示一个具有 *Left* 或 *Right* 两种可能类型的值。
+`Either` 可用于错误处理或需要输出两种不同类型记录的算子。
-#### Type Erasure & Type Inference
+#### 类型擦除和类型推断
-*Note: This Section is only relevant for Java.*
+*请注意: 本节只与 Java 有关。*
-The Java compiler throws away much of the generic type information after
compilation. This is
-known as *type erasure* in Java. It means that at runtime, an instance of an
object does not know
-its generic type any more. For example, instances of `DataStream<String>` and
`DataStream<Long>` look the
-same to the JVM.
+Java 编译器在编译后抛弃了大量泛型类型信息。这在 Java 中被称作 *类型擦除*。它意味着在运行时,对象的实例已经不知道它的泛型类型了。例如
`DataStream<String>` 和 `DataStream<Long>` 的实例在 JVM 看来是一样的。
-Flink requires type information at the time when it prepares the program for
execution (when the
-main method of the program is called). The Flink Java API tries to reconstruct
the type information
-that was thrown away in various ways and store it explicitly in the data sets
and operators. You can
-retrieve the type via `DataStream.getType()`. The method returns an instance
of `TypeInformation`,
-which is Flink's internal way of representing types.
+Flink 在准备程序执行时(程序的 main 方法被调用时)需要类型信息。Flink Java API
尝试重建以各种方式丢弃的类型信息,并将其显式存储在数据集和算子中。你可以通过 `DataStream.getType()` 获取数据类型。此方法返回
`TypeInformation` 的一个实例,这是 Flink 内部表示类型地方式。
-The type inference has its limits and needs the "cooperation" of the
programmer in some cases.
-Examples for that are methods that create data sets from collections, such as
-`ExecutionEnvironment.fromCollection(),` where you can pass an argument that
describes the type. But
-also generic functions like `MapFunction<I, O>` may need extra type
information.
+类型推断有其局限性,在某些情况下需要程序员的“配合”。
+这方面的示例是从集合创建数据集的方法,例如
`ExecutionEnvironment.fromCollection()`,你可以在这里传递一个描述类型的参数。 但是像 `MapFunction<I,
O>` 这样的泛型函数可能还需要额外的类型信息。
-The
-{% gh_link
/flink-core/src/main/java/org/apache/flink/api/java/typeutils/ResultTypeQueryable.java
"ResultTypeQueryable" %}
-interface can be implemented by input formats and functions to tell the API
-explicitly about their return type. The *input types* that the functions are
invoked with can
-usually be inferred by the result types of the previous operations.
+可以通过输入格式和函数实现 {% gh_link
/flink-core/src/main/java/org/apache/flink/api/java/typeutils/ResultTypeQueryable.java
"ResultTypeQueryable" %}
+接口,以明确告知 API 其返回类型。 被调函数的*输入类型*通常可以通过先前操作的结果类型来推断。
{% top %}
-Accumulators & Counters
+累加器和计数器
---------------------------
-Accumulators are simple constructs with an **add operation** and a **final
accumulated result**,
-which is available after the job ended.
+累加器简单地由 **加法操作** 和 **最终累加结果**构成,可在作业结束后使用。
-The most straightforward accumulator is a **counter**: You can increment it
using the
-```Accumulator.add(V value)``` method. At the end of the job Flink will sum up
(merge) all partial
-results and send the result to the client. Accumulators are useful during
debugging or if you
-quickly want to find out more about your data.
+最简单的累加器是一个 **计数器**:你可以使用
+```Accumulator.add(V value)``` 方法递增它。作业结束时 Flink 会合计(合并)所有的部分结果并发送给客户端。累加器在
debug 或者你想快速了解数据的时候非常有用。
-Flink currently has the following **built-in accumulators**. Each of them
implements the
+Flink 目前有如下 **内置累加器**。它们每一个都实现了
{% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/Accumulator.java
"Accumulator" %}
-interface.
+接口。
- {% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/IntCounter.java
"__IntCounter__" %},
{% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/LongCounter.java
"__LongCounter__" %}
- and {% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/DoubleCounter.java
"__DoubleCounter__" %}:
- See below for an example using a counter.
+ 和 {% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/DoubleCounter.java
"__DoubleCounter__" %}:
+ 有关使用计数器的示例,请参见下文。
- {% gh_link
/flink-core/src/main/java/org/apache/flink/api/common/accumulators/Histogram.java
"__Histogram__" %}:
- A histogram implementation for a discrete number of bins. Internally it is
just a map from Integer
- to Integer. You can use this to compute distributions of values, e.g. the
distribution of
- words-per-line for a word count program.
+ 离散数量桶的直方图实现。在内部,它只是一个从整数到整数的映射。你可以用它计算值的分布,例如一个词频统计程序中每行词频的分布。
-__How to use accumulators:__
+__如何使用累加器:__
-First you have to create an accumulator object (here a counter) in the
user-defined transformation
-function where you want to use it.
+首先你必须在要使用它的用户定义转换函数中创建累加器对象(下例为计数器)。
{% highlight java %}
private IntCounter numLines = new IntCounter();
{% endhighlight %}
-Second you have to register the accumulator object, typically in the
```open()``` method of the
-*rich* function. Here you also define the name.
+其次,你必须注册累加器对象,通常在富函数的 ```open()``` 方法中。在这里你还可以定义名称。
{% highlight java %}
getRuntimeContext().addAccumulator("num-lines", this.numLines);
{% endhighlight %}
-You can now use the accumulator anywhere in the operator function, including
in the ```open()``` and
-```close()``` methods.
+你现在可以在算子函数中的任何位置使用累加器,包括 ```open()``` 和
+```close()``` 方法。
{% highlight java %}
this.numLines.add(1);
{% endhighlight %}
-The overall result will be stored in the ```JobExecutionResult``` object which
is
-returned from the `execute()` method of the execution environment
-(currently this only works if the execution waits for the
-completion of the job).
+总体结果将存储在 ```JobExecutionResult``` 对象中,该对象是从执行环境的 `execute()` 方法返回的
+(目前这仅在执行等待作业完成时才有效)。
{% highlight java %}
myJobExecutionResult.getAccumulatorResult("num-lines")
{% endhighlight %}
-All accumulators share a single namespace per job. Thus you can use the same
accumulator in
-different operator functions of your job. Flink will internally merge all
accumulators with the same
-name.
+每个作业的所有累加器共享一个命名空间。 这样你就可以在作业的不同算子函数中使用相同的累加器。Flink 会在内部合并所有同名累加器。
-A note on accumulators and iterations: Currently the result of accumulators is
only available after
-the overall job has ended. We plan to also make the result of the previous
iteration available in the
-next iteration. You can use
+关于累加器和迭代请注意: 目前,累加器的结果只有在整个作业结束以后才可用。我们还计划实现在下一次迭代中使前一次迭代的结果可用。你可以使用
{% gh_link
/flink-java/src/main/java/org/apache/flink/api/java/operators/IterativeDataSet.java#L98
"Aggregators" %}
-to compute per-iteration statistics and base the termination of iterations on
such statistics.
+计算每次迭代的统计信息,并根据这些信息确定迭代何时终止。
-__Custom accumulators:__
+__自定义累加器:__
-To implement your own accumulator you simply have to write your implementation
of the Accumulator
-interface. Feel free to create a pull request if you think your custom
accumulator should be shipped
-with Flink.
+要实现你自己的累加器,只需编写累加器接口的实现即可。如果你认为 Flink 应该提供你的自定义累加器,请随意创建拉取请求。
Review comment:
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services