[ 
https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201184#comment-17201184
 ] 

Udi Meiri commented on BEAM-7463:
---------------------------------

The verifier is reading back 0 rows. The table seems to be delete only after 
the verifier has had a chance to read. The logs also say "writing 4 rows", 
which implies that the write to output_table either doesn't happen or is 
happening asynchronously.

It seems that writing using direct runner in native mode (BigQuerySink) falls 
back to using streaming inserts. (BigQueryWriter)
The more maintained version of streaming inserts is in BigQueryWriteFn (via 
WriteToBigQuery).
Both use BigQueryWrapper.insert_rows() however as the implementation.

Note that the other tests in this class use BQ Load jobs to write. (95% sure)

{code}
Streaming inserts reside temporarily in the streaming buffer, which has 
different availability characteristics than managed storage.
{code}
https://cloud.google.com/bigquery/docs/error-messages#missingunavailable-data

Also: 
https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataavailability

My conclusion is that this test is depending on streamed data to be immediately 
available, but that is not guaranteed.
Suggested solution is to modify the matcher to retry for a few seconds until 
the streaming buffer has been processed (like 
BigqueryFullResultStreamingMatcher).


> BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: 
> incorrect checksum 
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7463
>                 URL: https://issues.apache.org/jira/browse/BEAM-7463
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Udi Meiri
>            Priority: P1
>              Labels: currently-failing
>             Fix For: Not applicable
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types 
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38 
> ----------------------------------------------------------------------
> 15:03:38 Traceback (most recent call last):
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
>  line 211, in test_big_query_new_types
> 15:03:38     big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
>  line 82, in run_bq_pipeline
> 15:03:38     result = p.run()
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 15:03:38     else test_runner_api))
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 15:03:38     self._options).run(False)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 15:03:38     return self.runner.run_pipeline(self, self._options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 15:03:38     hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError: 
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and 
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38      but: Expected checksum is 
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is 
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher? 
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>  
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well: 
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to