wuchong commented on a change in pull request #8918:
[FLINK-12944][docs]Translate Streaming File Sink page into Chinese
URL: https://github.com/apache/flink/pull/8918#discussion_r298822925
##########
File path: docs/dev/connectors/streamfile_sink.zh.md
##########
@@ -84,55 +77,41 @@ input.addSink(sink)
</div>
</div>
-This will create a streaming sink that creates hourly buckets and uses a
-default rolling policy. The default bucket assigner is
+上面的代码创建了一个按小时分桶、按默认策略滚动的 sink。默认分桶器是
[DateTimeBucketAssigner]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html)
-and the default rolling policy is
-[DefaultRollingPolicy]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html).
-You can specify a custom
+,默认滚动策略是
+[DefaultRollingPolicy]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html)。
+可以为 sink builder 自定义
[BucketAssigner]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html)
-and
-[RollingPolicy]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)
-on the sink builder. Please check out the JavaDoc for
-[StreamingFileSink]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)
-for more configuration options and more documentation about the workings and
-interactions of bucket assigners and rolling policies.
+和
+[RollingPolicy]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html)。
+更多配置操作以及分桶器和滚动策略的工作机制和相互影响请参考:
+[StreamingFileSink]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html)。
-#### Using Bulk-encoded Output Formats
+#### 使用批量编码输出格式
-In the above example we used an `Encoder` that can encode or serialize each
-record individually. The streaming file sink also supports bulk-encoded output
-formats such as [Apache Parquet](http://parquet.apache.org). To use these,
-instead of `StreamingFileSink.forRowFormat()` you would use
-`StreamingFileSink.forBulkFormat()` and specify a `BulkWriter.Factory`.
+上面的示例使用 `Encoder` 分别序列化每一个记录。除此之外,流式文件 sink 还支持批量编码的输出
+格式,比如 Apache Parquet](http://parquet.apache.org)。使用这种编码格式需要用
`StreamingFileSink.forBulkFormat()`
+来代替 `StreamingFileSink.forRowFormat()` ,然后指定一个 `BulkWriter.Factory`。
[ParquetAvroWriters]({{ site.javadocs_baseurl
}}/api/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.html)
-has static methods for creating a `BulkWriter.Factory` for various types.
+中包含了为各种类型创建 `BulkWriter.Factory` 的静态方法。
<div class="alert alert-info">
- <b>IMPORTANT:</b> Bulk-encoding formats can only be combined with the
- `OnCheckpointRollingPolicy`, which rolls the in-progress part file on
- every checkpoint.
+ <b>IMPORTANT:</b> 批量编码格式只能和 `OnCheckpointRollingPolicy` 结合使用,每次做
checkpoint 时滚动文件。
</div>
-#### Important Considerations for S3
-
-<span class="label label-danger">Important Note 1</span>: For S3, the
`StreamingFileSink`
-supports only the [Hadoop-based](https://hadoop.apache.org/) FileSystem
implementation, not
-the implementation based on [Presto](https://prestodb.io/). In case your job
uses the
-`StreamingFileSink` to write to S3 but you want to use the Presto-based one
for checkpointing,
-it is advised to use explicitly *"s3a://"* (for Hadoop) as the scheme for the
target path of
-the sink and *"s3p://"* for checkpointing (for Presto). Using *"s3://"* for
both the sink
-and checkpointing may lead to unpredictable behavior, as both implementations
"listen" to that scheme.
-
-<span class="label label-danger">Important Note 2</span>: To guarantee
exactly-once semantics while
-being efficient, the `StreamingFileSink` uses the [Multi-part
Upload](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)
-feature of S3 (MPU from now on). This feature allows to upload files in
independent chunks (thus the "multi-part")
-which can be combined into the original file when all the parts of the MPU are
successfully uploaded.
-For inactive MPUs, S3 supports a bucket lifecycle rule that the user can use
to abort multipart uploads
-that don't complete within a specified number of days after being initiated.
This implies that if you set this rule
-aggressively and take a savepoint with some part-files being not fully
uploaded, their associated MPUs may time-out
-before the job is restarted. This will result in your job not being able to
restore from that savepoint as the
-pending part-files are no longer there and Flink will fail with an exception
as it tries to fetch them and fails.
+#### 关于S3的重要内容
+
+<span class="label label-danger">重要提示 1</span>: 对于 S3,`StreamingFileSink`
只支持基于 [Hadoop](https://hadoop.apache.org/)
+的文件系统实现,不支持基于 [Presto](https://prestodb.io/) 的实现。如果想使用 `StreamingFileSink` 向
S3 写入数据并且将
+checkpoint 放在基于 Presto 的文件系统,建议在 sink 路径中明确指定 *"s3a://"* 模式,在 checkpoint
路径中明确指定 *"s3p://"*
Review comment:
```suggestion
checkpoint 放在基于 Presto 的文件系统,建议明确指定 *"s3a://"* (为 Hadoop)作为sink的目标路径方案,并且为
checkpoint 路径明确指定 *"s3p://"* (为 Presto)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services