[ https://issues.apache.org/jira/browse/BEAM-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582963#comment-15582963 ]
Luke Cwik commented on BEAM-747: -------------------------------- The number of shards is not deterministic without explicitly limiting it on the sink. Also, requiring support for limited parallelism increases the barrier to entry for this test for runners. Typically if you get one filename for the YYY-of-ZZZ case, you can figure out all the remaining by parsing out the bounds and knowing exactly how many files exist and what they are named. > Text checksum verifier is not resilient to eventually consistent filesystems > ---------------------------------------------------------------------------- > > Key: BEAM-747 > URL: https://issues.apache.org/jira/browse/BEAM-747 > Project: Beam > Issue Type: Bug > Components: testing > Affects Versions: Not applicable > Reporter: Daniel Halperin > Assignee: Mark Liu > > Example 1: > https://builds.apache.org/job/beam_PreCommit_MavenVerify/3934/org.apache.beam$beam-examples-java/console > Here it looks like we need to retry listing files, at least a little bit, if > none are found. They did show up: > {code} > gsutil ls > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results\* > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00000-of-00003 > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00001-of-00003 > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00002-of-00003 > {code} > Example 2: > https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$beam-examples-java/1525/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/ > Here it looks like we need to fill in the shard template if the filesystem > does not give us a consistent result: > {code} > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > readLines > INFO: [0 of 1] Read 162 lines from file: > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-00000-of-00003 > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > readLines > INFO: [1 of 1] Read 144 lines from file: > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-00002-of-00003 > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > matchesSafely > INFO: Generated checksum for output data: > aec68948b2515e6ea35fd1ed7649c267a10a01e5 > {code} > We missed shard 1-of-3 and hence got the wrong checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)