[ 
https://issues.apache.org/jira/browse/BEAM-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924792#comment-15924792
 ] 

Mark Liu commented on BEAM-1715:
--------------------------------

File matcher contains function to describe 
mismatch(https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tests/pipeline_verifiers.py#L116).
 I think it's essentially a problem of GCS inconsistency, and it also happens 
in Java. Output match was executed twice. Failed at first time and succeeded at 
second. (The second one is to retrieve mismatch message. This is implementation 
detail of hamcrest.all_of).

I'd like to add ~20s wait before verification to have all output file ready on 
GCS.

> Wordcount on Dataflow checksum mismatch
> ---------------------------------------
>
>                 Key: BEAM-1715
>                 URL: https://issues.apache.org/jira/browse/BEAM-1715
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Ahmet Altay
>            Assignee: Mark Liu
>
> This run failed: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/1502/consoleFull
> Output is: 
> root: INFO: Read from given path 
> gs://temp-storage-for-end-to-end-tests/py-wordcount-cloud/output/py-wordcount-1489489356/results*-of-*,
>  3179 lines, checksum: a9bcb4acd65daf8f6a9ac5e026de7803cc09f662.
> root: INFO: Read from given path 
> gs://temp-storage-for-end-to-end-tests/py-wordcount-cloud/output/py-wordcount-1489489356/results*-of-*,
>  4784 lines, checksum: 33535a832b7db6d78389759577d4ff495980b9c0.
> Dataflow job: 
> https://pantheon.corp.google.com/dataflow/job/2017-03-14_05_14_52-13300760513112605405?pli=1&project=apache-beam-testing
> Mark, can you confirm that this is actually a checksum mismatch and not a 
> failures in the test framework? And could you also make the error more clear, 
> add a message like mismatch ....



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to