[GitHub] [flink] MrWhiteSike commented on a change in pull request #18718: [FLINK-25782] [docs] Translate datastream filesystem.md page into Chi…

GitBox Tue, 15 Feb 2022 20:07:16 -0800


MrWhiteSike commented on a change in pull request #18718:
URL: https://github.com/apache/flink/pull/18718#discussion_r807518027




##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
 ```
 {{< /tab >}}
 {{< /tabs >}}
+<a name="current-limitations"></a>
 
-### Current Limitations
+### 当前限制
 
-Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
+对于大量积压的文件， Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进，而下一个文件可能包含比 Watermark 
更晚的数据。
 
-For Unbounded File Sources, the enumerator currently remembers paths of all 
already processed files, which is a state that can, in some cases, grow rather 
large.
-There are plans to add a compressed form of tracking already processed files 
in the future (for example, by keeping modification timestamps below 
boundaries).
+对于无界文件源，枚举器会记住当前所有已处理文件的路径，在某些情况下，这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件（例如，将修改时间戳保持在边界以下）。
+<a name="behind-the-scenes"></a>
 
-### Behind the Scenes
+### 后话
 {{< hint info >}}
-If you are interested in how File Source works through the new data source API 
design, you may
-want to read this part as a reference. For details about the new data source 
API, check out the
-[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
and
+如果你对新设计的数据源 API 中的文件源是如何工作的感兴趣，可以阅读本部分作为参考。关于新的数据源 API 的更多细节，请参考
+[documentation on data sources]({{< ref "docs/dev/datastream/sources.md" >}}) 
和在
 <a 
href="https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface";>FLIP-27</a>
-for more descriptive discussions.
+中获取更加具体的讨论详情。
 {{< /hint >}}
+<a name="file-sink"></a>
 
-## File Sink
+## 文件 Sink
 
-The file sink writes incoming data into buckets. Given that the incoming 
streams can be unbounded,
-data in each bucket is organized into part files of finite size. The bucketing 
behaviour is fully configurable
-with a default time-based bucketing where we start writing a new bucket every 
hour. This means that each resulting
-bucket will contain files with records received during 1 hour intervals from 
the stream.
+文件 Sink 将传入的数据写入存储桶中。考虑到输入流可以是无界的，每个桶中的数据被组织成有限大小的 Part 文件。
+往桶中写数据的行为完全可默认配置成基于时间的，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。

Review comment:
       ```suggestion
   完全可以配置为基于时间的方式往桶中写入数据，比如我们可以设置每个小时的数据写入一个新桶中。这意味着桶中将包含一个小时间隔内接收到的记录。
   ```
   
   What do you think about it?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] MrWhiteSike commented on a change in pull request #18718: [FLINK-25782] [docs] Translate datastream filesystem.md page into Chi…

Reply via email to