[ 
https://issues.apache.org/jira/browse/BEAM-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582963#comment-15582963
 ] 

Luke Cwik commented on BEAM-747:
--------------------------------

The number of shards is not deterministic without explicitly limiting it on the 
sink. Also, requiring support for limited parallelism increases the barrier to 
entry for this test for runners. Typically if you get one filename for the 
YYY-of-ZZZ case, you can figure out all the remaining by parsing out the bounds 
and knowing exactly how many files exist and what they are named.

> Text checksum verifier is not resilient to eventually consistent filesystems
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-747
>                 URL: https://issues.apache.org/jira/browse/BEAM-747
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>    Affects Versions: Not applicable
>            Reporter: Daniel Halperin
>            Assignee: Mark Liu
>
> Example 1: 
> https://builds.apache.org/job/beam_PreCommit_MavenVerify/3934/org.apache.beam$beam-examples-java/console
> Here it looks like we need to retry listing files, at least a little bit, if 
> none are found. They did show up:
> {code}
> gsutil ls 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results\*
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00000-of-00003
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00001-of-00003
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-00002-of-00003
> {code}
> Example 2: 
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$beam-examples-java/1525/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/
> Here it looks like we need to fill in the shard template if the filesystem 
> does not give us a consistent result:
> {code}
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> readLines
> INFO: [0 of 1] Read 162 lines from file: 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-00000-of-00003
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> readLines
> INFO: [1 of 1] Read 144 lines from file: 
> gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-00002-of-00003
> Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher 
> matchesSafely
> INFO: Generated checksum for output data: 
> aec68948b2515e6ea35fd1ed7649c267a10a01e5
> {code}
> We missed shard 1-of-3 and hence got the wrong checksum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to