[ https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765711#comment-15765711 ]
Paul Findlay edited comment on BEAM-1190 at 12/21/16 12:52 AM: --------------------------------------------------------------- [~dhalp...@google.com] Correct me if I'm wrong.. but isn't FileBasedSource.createReader basically already doing a stat for each file in the expanded list but swallowing the error if there is one, and leaving it for FileBasedReader.startImpl to blow up? We are just asking for the method to not be final so we can treat the different sub-classes of IOException appropriately (for our pipeline). But would love to know if there is scary behaviour we haven't considered. was (Author: p...@findlay.net.nz): [~dhalp...@google.com] Correct me if I'm wrong.. but isn't FileBasedSource.createReader basically already doing a stat for each file in the expanded list but swallowing the error if there is one, and leaving it for startImpl to blow up? We are just asking for the method to not be final so we can treat the different sub-classes of IOException appropriately (for our pipeline). But would love to know if there is scary behaviour we haven't considered. > FileBasedSource should ignore files that matched the glob but don't exist > ------------------------------------------------------------------------- > > Key: BEAM-1190 > URL: https://issues.apache.org/jira/browse/BEAM-1190 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > > See user issue: > http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing > We should, after globbing the files in FileBasedSource, individually stat > every file and remove those that don't exist, to account for the possibility > that glob yielded non-existing files due to eventual consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)