[
https://issues.apache.org/jira/browse/BEAM-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269802#comment-16269802
]
Eugene Kirpichov commented on BEAM-3030:
----------------------------------------
This also happens in FileIOTest:
https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-direct-java/5317/testReport/junit/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
> watchForNewFiles() can emit a file multiple times if it's growing
> -----------------------------------------------------------------
>
> Key: BEAM-3030
> URL: https://issues.apache.org/jira/browse/BEAM-3030
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
> Fix For: 2.3.0
>
>
> TextIO and AvroIO watchForNewFiles(), as well as
> FileIO.match().continuously(), use Watch transform under the hood, and watch
> the set of Metadata matching a filepattern.
> Two Metadata's with the same filename but different size are not considered
> equal, so if these transforms observe the same file multiple times with
> different sizes, they'll read the file multiple times.
> This is likely not yet a problem for production users, because these features
> require SDF, it's supported only in Dataflow runner, and users of the
> Dataflow runner are likely to use only files on GCS which doesn't support
> appends. However, this needs to be fixed still.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)