[jira] [Work logged] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum

ASF GitHub Bot (JIRA) Tue, 04 Jun 2019 22:01:33 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-7463?focusedWorklogId=254191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254191
 ]


ASF GitHub Bot logged work on BEAM-7463:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Jun/19 05:00
            Start Date: 05/Jun/19 05:00
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on issue #8751: [BEAM-7463] 
parallelize BQ IT tests
URL: https://github.com/apache/beam/pull/8751#issuecomment-498938989
 
 
   Hi @Juta, could you please comment which variables are being shared by test 
scenarios? This change should improve test parallelism in test modules that 
have multiple test cases, however I believe it does not remove side-effects in 
existing test scenarios: the tests you modify use test-case level fixtures 
(`setUp()`).  For every test method a new instance of TestCase is created, and 
setUp will be called on this instance.  
   
   From unittest docs: 
https://docs.python.org/3/library/unittest.html#class-and-module-fixtures
   
   >  A new TestCase instance is created as a unique test fixture used to 
execute each individual test method.
   
   This change would indeed take effect if we used module-level or class-level 
fixtures (e.g. `setUpClass()`): with `_multiprocesses_can_split_` module-level 
fixtures would be called multiple times. However I don't see usages of those 
types of fixtures in integration tests, and we should not use them since they 
can create side-effects. So it is not clear to me which variables are shared in 
the tests, that will not be shared with this change. If you think I am missing 
something, could you post a simplified code-snippet that demonstrates a 
potential race/side effect in existing tests? Thank you.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 254191)
    Time Spent: 40m  (was: 0.5h)

> Bigquery IO ITs are flaky: incorrect checksum
> ---------------------------------------------
>
>                 Key: BEAM-7463
>                 URL: https://issues.apache.org/jira/browse/BEAM-7463
>             Project: Beam
>          Issue Type: Bug
>          Components: io-python-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Juta Staes
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> 15:03:38 FAIL: test_big_query_new_types 
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> 15:03:38 
> ----------------------------------------------------------------------
> 15:03:38 Traceback (most recent call last):
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py",
>  line 211, in test_big_query_new_types
> 15:03:38     big_query_query_to_table_pipeline.run_bq_pipeline(options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py",
>  line 82, in run_bq_pipeline
> 15:03:38     result = p.run()
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 15:03:38     else test_runner_api))
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 15:03:38     self._options).run(False)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 15:03:38     return self.runner.run_pipeline(self, self._options)
> 15:03:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 15:03:38     hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 15:03:38 AssertionError: 
> 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and 
> Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214)
> 15:03:38      but: Expected checksum is 
> 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is 
> da39a3ee5e6b4b0d3255bfef95601890afd80709
> {noformat}
> [~Juta] could this be caused by changes to Bigquery matcher? 
> https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134
>  
> cc: [~pabloem] [~chamikara] [~apilloud]
> A recent postcommit run has BQ failures in other tests as well: 
> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum

Reply via email to