xintongsong commented on a change in pull request #9677: [FLINK-13979] 
[docs-zh] Translate new streamfilesink docs to chinese
URL: https://github.com/apache/flink/pull/9677#discussion_r324549631
 
 

 ##########
 File path: docs/dev/connectors/streamfile_sink.zh.md
 ##########
 @@ -28,80 +28,69 @@ under the License.
 
 这个连接器提供了一个 Sink 来将分区文件写入到支持 [Flink `FileSystem`]({{ 
site.baseurl}}/zh/ops/filesystems/index.html) 接口的文件系统中。
 
-In order to handle unbounded data streams, the streaming file sink writes 
incoming data
-into buckets. The bucketing behaviour is fully configurable with a default 
time-based
-bucketing where we start writing a new bucket every hour and thus get files 
that correspond to
-records received during certain time intervals from the stream.
+为了处理无界的流数据,Streaming File Sink 
会将数据写入到桶中。如何分桶是可以配置的,默认策略是基于时间的分桶,这种策略每个小时创建并写入一个新的桶,从而得到流数据在特定时间间隔内接收记录所对应的文件。
 
-The bucket directories themselves contain several part files with the actual 
output data, with at least
-one for each subtask of the sink that has received data for the bucket. 
Additional part files will be created according to the configurable
-rolling policy. The default policy rolls files based on size, a timeout that 
specifies the maximum duration for which a file can be open, and a maximum 
inactivity timeout after which the file is closed.
+对于每一个接收桶数据的 Sink Subtask ,在桶中至少会存在一个正在接收数据的部分文件(part file)。其余的部分文件(part 
file)将根据滚动策略创建,滚动策略是可以配置的。默认的策略是根据文件大小和超时时间来滚动文件。超时时间指打开文件的最长持续时间,以及文件关闭后的最长非活动时间。
 
  <div class="alert alert-info">
-     <b>IMPORTANT:</b> Checkpointing needs to be enabled when using the 
StreamingFileSink. Part files can only be finalized
-     on successful checkpoints. If checkpointing is disabled part files will 
forever stay in `in-progress` or `pending` state
-     and cannot be safely read by downstream systems.
+     <b>重要:</b> 使用 Streaming File Sink 时需要启用 Checkpoint ,每次做 Checkpoint 
时滚动文件。如果 Checkpoint 被禁用,部分文件(part file)将永远处于 'in-progress' 或 'pending' 
状态,下游系统无法安全地读取。
  </div>
 
  <img src="{{ site.baseurl }}/fig/streamfilesink_bucketing.png" class="center" 
style="width: 100%;" />
 
-### Bucket Assignment
+### 桶分配
 
-The bucketing logic defines how the data will be structured into 
subdirectories inside the base output directory.
+桶分配逻辑定义了如何将数据结构化为基本输出目录中的子目录
 
-Both row and bulk formats use the [DateTimeBucketAssigner]({{ 
site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
 as the default assigner.
-By default the DateTimeBucketAssigner creates hourly buckets based on the 
system default timezone
-with the following format: `yyyy-MM-dd--HH`. Both the date format (i.e. bucket 
size) and timezone can be
-configured manually.
+行格式和批量格式都使用 [DateTimeBucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
 作为默认的分配器。
+默认情况下,DateTimeBucketAssigner 基于系统默认时区每小时创建一个桶,格式如下: `yyyy-MM-dd--HH` 
。日期格式(即桶的大小)和时区都可以手动配置。
 
-We can specify a custom [BucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
 by calling `.withBucketAssigner(assigner)` on the format builders.
+我们可以在格式构建器上调用 `.withBucketAssigner(assigner)` 来自定义 [BucketAssigner]({{ 
site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
 。
 
-Flink comes with two built in BucketAssigners:
+Flink 有两个内置的 BucketAssigners :
 
- - [DateTimeBucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
 : Default time based assigner
- - [BasePathBucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/BasePathBucketAssigner.html)
 : Assigner that stores all part files in the base path (single global bucket)
+ - [DateTimeBucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
 :默认基于时间的分配器
+ - [BasePathBucketAssigner]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/BasePathBucketAssigner.html)
 :将所有部分文件(part file)存储在基本路径中的分配器(单个全局桶)
 
-### Rolling Policy
+### 滚动策略
 
-The [RollingPolicy]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
 defines when a given in-progress part file will be closed and moved to the 
pending and later to finished state.
-In combination with the checkpointing interval (pending files become finished 
on the next checkpoint) this controls how quickly
-part files become available for downstream readers and also the size and 
number of these parts.
+滚动策略 [RollingPolicy]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
 定义了指定的文件在何时关闭(closed)并将其变为 Pending 状态,随后变为 Finished 状态。与 Checkpoint 间隔(处于 
Pending 状态的文件会在下一次 Checkpoint 变为 Finished 状态)结合使用,可以控制部分文件(part 
file)对下游读取者可用的速度以及大小和数量。
 
-Flink comes with two built-in RollingPolicies:
+Flink 有两个内置的滚动策略:
 
  - [DefaultRollingPolicy]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html)
  - [OnCheckpointRollingPolicy]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/OnCheckpointRollingPolicy.html)
 
-### Part file lifecycle
+### 部分文件(part file) 生命周期
 
-In order to use the output of the StreamingFileSink in downstream systems, we 
need to understand the naming and lifecycle of the output files produced.
+为了在下游系统中使用 StreamingFileSink 的输出,我们需要了解输出文件的命名规则和生命周期。
 
-Part files can be in one of three states:
- 1. **In-progress** : The part file that is currently being written to is 
in-progress
- 2. **Pending** : Once a part file is closed for writing it becomes pending
- 3. **Finished** : On successful checkpoints pending files become finished
+部分文件(part file)可以处于以下三种状态之一:
+ 1. **In-progress** :当前文件正在写入中
+ 2. **Pending** :当处于 In-progress 状态的文件关闭(closed)了,就变为 Pending 状态
+ 3. **Finished** :在成功的 Checkpoint 后,Pending 状态将变为 Finished 状态
 
-Only finished files are safe to read by downstream systems as those are 
guaranteed to not be modified later. Finished files can be distinguished by 
their naming scheme only.
+处于 Finished 状态的文件以后不会被修改,可以被下游系统安全地读取。Finished 状态的文件只能通过它们的命名来区分。
 
-File naming schemes:
- - **In-progress / Pending**: `part-subtaskIndex-partFileIndex.inprogress.uid`
- - **Finished:** `part-subtaskIndex-partFileIndex`
+文件命名方案:
+ - **In-progress / Pending**:`part-subtaskIndex-partFileIndex.inprogress.uid`
+ - **Finished** :`part-subtaskIndex-partFileIndex`
 
-Part file indexes are strictly increasing for any given subtask (in the order 
they were created). However these indexes are not always sequential. When the 
job restarts, the next part index for all subtask will be the `max part index + 
1`.
+对于任何给定的 Subtask (按照它们的创建顺序),部分文件(part file)的索引严格递增。但是,这些索引并不总是连续的。当作业重启时,所有 
Subtask 部分文件(part file)索引的值将是 `max part index + 1` 。
 
-Each writer subtask will have a single in-progress part file at any given time 
for every active bucket, but there can be several pending and finished files.
+对于每个活动的桶,写入 Subtask 在任何时候都只有一个处于 In-progress 状态的部分文件(part file),但是可能有几个 
Penging 和 Finished 状态的部分文件(part file)。
 
 Review comment:
   ”写入Subtask” 
有歧义,只看中文会产生到底是subtask向其他地方(部分文件)写数据,还是从别的地方向subtask写数据这样的疑惑。这里subtask应该是writer

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to