[
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767937#comment-15767937
]
Daniel Halperin commented on BEAM-1190:
---------------------------------------
I do not think this is generally safe -- it may mask underlying bugs. For
example, we should never invoke this code unless the filesystem is known be
eventually list-consistent but consistent with stat.
This change does not obviate the need for [BEAM-60] -- because users may want
to go the other way, and expand the inconsistent list they get. I propose you
package this logic up in whatever the new name for IOChannelUtils is as one of
the things users can do in the code they run at expand-time.
Bringing the user into the loop is also nice because it makes them deal with
eventual consistency up front. We are burned a lot by users who don't realize
what their globs really mean.
> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
> Key: BEAM-1190
> URL: https://issues.apache.org/jira/browse/BEAM-1190
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat
> every file and remove those that don't exist, to account for the possibility
> that glob yielded non-existing files due to eventual consistency.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)