[
https://issues.apache.org/jira/browse/FLINK-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aljoscha Krettek closed FLINK-3515.
-----------------------------------
Resolution: Duplicate
> Make the "file monitoring source" exactly-once
> ----------------------------------------------
>
> Key: FLINK-3515
> URL: https://issues.apache.org/jira/browse/FLINK-3515
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 0.10.2
> Reporter: Stephan Ewen
>
> The stream source that watches directories for changes is currently not
> "exactly-once".
> To make it exactly once, the source (that generates files to be read) and the
> flatMap (that reads the files) need to keep track of where they were at the
> point of a checkpoint.
> Assuming that files do not change after creation (HDFS / S3 style), we can
> make this the following way:
> - The source can track the files it already emitted downstream via file
> creation/modification timestamp, assuming that new files always get newer
> timestamps.
> - The flatMappers need to always store the path of their current file
> fragment, plus the byte offset where they were within that file split.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)