[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767818#comment-15767818
 ] 

Daniel Halperin commented on BEAM-1190:
---------------------------------------

Not for very long -- the stat at open-time is getting removed as we get the 
information we need from the list call, but throw it away like we shouldn't be.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.

> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
>                 Key: BEAM-1190
>                 URL: https://issues.apache.org/jira/browse/BEAM-1190
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to