[
https://issues.apache.org/jira/browse/BEAM-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yi Hu updated BEAM-14267:
-------------------------
Description:
In TextIO and AvroIO, we have a configuration option called watchForNewFiles,
and in FileIO.MatchConfiguration, we have an option called watchInterval. Right
now, these match any files according to the filtering criteria, and then
periodically check for new files. A file is determined to be new if it has a
different filename than a file that has already been read.
We want to add an option to choose to consider a file new if it has a different
timestamp from an existing file, even if the file itself has the same name.
See the following design doc for more detail:
[https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/|https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/e]
was:
In TextIO and AvroIO, we have a configuration option called watchForNewFiles,
and in FileIO.MatchConfiguration, we have an option called watchInterval. Right
now, these match any files according to the filtering criteria, and then
periodically check for new files. A file is determined to be new if it has a
different filename than a file that has already been read.
We want to add an option to choose to consider a file new if it has a different
timestamp from an existing file, even if the file itself has the same name.
See the following design doc for more detail:
[https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/e]
> Update watchForNewFiles to allow reading already read files with a new
> timestamp
> --------------------------------------------------------------------------------
>
> Key: BEAM-14267
> URL: https://issues.apache.org/jira/browse/BEAM-14267
> Project: Beam
> Issue Type: New Feature
> Components: io-java-files
> Reporter: Yi Hu
> Assignee: Yi Hu
> Priority: P2
>
> In TextIO and AvroIO, we have a configuration option called watchForNewFiles,
> and in FileIO.MatchConfiguration, we have an option called watchInterval.
> Right now, these match any files according to the filtering criteria, and
> then periodically check for new files. A file is determined to be new if it
> has a different filename than a file that has already been read.
> We want to add an option to choose to consider a file new if it has a
> different timestamp from an existing file, even if the file itself has the
> same name.
> See the following design doc for more detail:
> [https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/|https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/e]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)