MrWhiteSike commented on a change in pull request #18718:
URL: https://github.com/apache/flink/pull/18718#discussion_r807499588
##########
File path: docs/content.zh/docs/connectors/datastream/filesystem.md
##########
@@ -231,72 +222,68 @@ new HiveSource<>(
```
{{< /tab >}}
{{< /tabs >}}
+<a name="current-limitations"></a>
-### Current Limitations
+### 当前限制
-Watermarking does not work very well for large backlogs of files. This is
because watermarks eagerly advance within a file, and the next file might
contain data later than the watermark.
+对于大量积压的文件, Watermark 效果不佳。这是因为 Watermark 急切地在一个文件中前进,而下一个文件可能包含比 Watermark
更晚的数据。
-For Unbounded File Sources, the enumerator currently remembers paths of all
already processed files, which is a state that can, in some cases, grow rather
large.
-There are plans to add a compressed form of tracking already processed files
in the future (for example, by keeping modification timestamps below
boundaries).
+对于无界文件源,枚举器会记住当前所有已处理文件的路径,在某些情况下,这种状态可能会变得相当大。
+计划在未来增加一种压缩的方式来跟踪已经处理的文件(例如,将修改时间戳保持在边界以下)。
Review comment:
```suggestion
未来计划将引入一种压缩的方式来跟踪已经处理的文件(例如,将修改时间戳保持在边界以下)。
```
What do you think about it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]