[
https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201184#comment-17201184
]
Udi Meiri commented on BEAM-7463:
---------------------------------
The verifier is reading back 0 rows. The table seems to be delete only after
the verifier has had a chance to read. The logs also say "writing 4 rows",
which implies that the write to output_table either doesn't happen or is
happening asynchronously.
It seems that writing using direct runner in native mode (BigQuerySink) falls
back to using streaming inserts. (BigQueryWriter)
The more maintained version of streaming inserts is in BigQueryWriteFn (via
WriteToBigQuery).
Both use BigQueryWrapper.insert_rows() however as the implementation.
Note that the other tests in this class use BQ Load jobs to write. (95% sure)
{code}
Streaming inserts reside temporarily in the streaming buffer, which has
different availability characteristics than managed storage.
{code}
https://cloud.google.com/bigquery/docs/error-messages#missingunavailable-data
Also:
https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataavailability
My conclusion is that this test is depending on streamed data to be immediately
available, but that is not guaranteed.
Suggested solution is to modify the matcher to retry for a few seconds until
the streaming buffer has been processed (like
BigqueryFullResultStreamingMatcher).
> BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites:
> incorrect checksum
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-7463
> URL: https://issues.apache.org/jira/browse/BEAM-7463
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Valentyn Tymofieiev
> Assignee: Udi Meiri
> Priority: P1
> Labels: currently-failing
> Fix For: Not applicable
>
> Time Spent: 6h
> Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38
> ----------------------------------------------------------------------
> 15:03:38 Traceback (most recent call last):
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
> line 211, in test_big_query_new_types
> 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
> line 82, in run_bq_pipeline
> 15:03:38 result = p.run()
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
> line 107, in run
> 15:03:38 else test_runner_api))
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
> line 406, in run
> 15:03:38 self._options).run(False)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
> line 419, in run
> 15:03:38 return self.runner.run_pipeline(self, self._options)
> 15:03:38 File
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
> line 51, in run_pipeline
> 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError:
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38 but: Expected checksum is
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher?
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well:
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull
--
This message was sent by Atlassian Jira
(v8.3.4#803005)