[
https://issues.apache.org/jira/browse/BEAM-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Kirpichov closed BEAM-3030.
----------------------------------
Resolution: Fixed
> watchForNewFiles() can emit a file multiple times if it's growing
> -----------------------------------------------------------------
>
> Key: BEAM-3030
> URL: https://issues.apache.org/jira/browse/BEAM-3030
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
> Fix For: 2.3.0
>
>
> TextIO and AvroIO watchForNewFiles(), as well as
> FileIO.match().continuously(), use Watch transform under the hood, and watch
> the set of Metadata matching a filepattern.
> Two Metadata's with the same filename but different size are not considered
> equal, so if these transforms observe the same file multiple times with
> different sizes, they'll read the file multiple times.
> This is likely not yet a problem for production users, because these features
> require SDF, it's supported only in Dataflow runner, and users of the
> Dataflow runner are likely to use only files on GCS which doesn't support
> appends. However, this needs to be fixed still.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)