[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767818#comment-15767818
 ] 

Daniel Halperin edited comment on BEAM-1190 at 12/21/16 6:46 PM:
-----------------------------------------------------------------

Not for very long -- the stat at open-time is getting removed. We get the size 
information we need from the list call, but currently throw it away for silly 
reasons.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.


was (Author: dhalp...@google.com):
Not for very long -- the stat at open-time is getting removed as we get the 
information we need from the list call, but throw it away like we shouldn't be.

How would you feel about the ability to execute code in the worker when the 
glob is expanded. I think checking which files actually exist then and deciding 
in one centralized place in time which files you want to read (and committing 
to that decision for later) is probably a simpler and safer solution.

> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
>                 Key: BEAM-1190
>                 URL: https://issues.apache.org/jira/browse/BEAM-1190
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to