[
https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883624#comment-16883624
]
Juta Staes commented on BEAM-7463:
----------------------------------
I remember removing the sorting from the BigqueryFullResultMatcher because I
think the order of the elements within a row should not be sorted. I updated
the tests using the BigqueryFullResultMatcher to include the order in the query
(and thus have a deterministic order for the elements within one row). The
outer dimension is then still sorted.
I looked at the tests that are failing due to the checksum error and they seem
to be using the BigqueryMatcher which also does not sort the elements within a
row and when computing the checksum it does sort the outer dimension. However
the checksum in sometimes incorrect even for test that include an order in the
query (e.g. for this test:
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L196])
> Bigquery IO ITs are flaky: incorrect checksum
> ---------------------------------------------
>
> Key: BEAM-7463
> URL: https://issues.apache.org/jira/browse/BEAM-7463
> Project: Beam
> Issue Type: Bug
> Components: io-python-gcp
> Reporter: Valentyn Tymofieiev
> Assignee: Pablo Estrada
> Priority: Major
> Labels: currently-failing
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38
> ----------------------------------------------------------------------
> 15:03:38 Traceback (most recent call last):
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
> line 211, in test_big_query_new_types
> 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
> line 82, in run_bq_pipeline
> 15:03:38 result = p.run()
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
> line 107, in run
> 15:03:38 else test_runner_api))
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
> line 406, in run
> 15:03:38 self._options).run(False)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
> line 419, in run
> 15:03:38 return self.runner.run_pipeline(self, self._options)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
> line 51, in run_pipeline
> 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError:
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38 but: Expected checksum is
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher?
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well:
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)