Re: Flaky tests in Beam

Jan Lukavský Mon, 16 Aug 2021 09:29:45 -0700

Not directly related to the 'flakiness' discussion of this thread, but Ithink it would be good if pre-commit checks could be run locally withoutGCP credentials.


On 8/16/21 6:24 PM, Luke Cwik wrote:

The fix was inadvertently run in dry run mode so didn't make anychanges. Since the fix was taking a couple of hours or so and it wasgetting late on Friday people didn't want to start it again till today(after the weekend).

I don't think removing the few tests that run an unbounded pipeline onDataflow for a long term is a good idea. Sure, we can disable them andre-enable them when there is an issue that is blocking folks.

On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <[email protected]<mailto:[email protected]>> wrote:


    The two hours to estimated fix has long passed and we are now at
    18 days since the last successful run. What is the latest estimate?

    It sounds like these tests are primarily testing Dataflow, not
    Beam. They seem like good candidates to remove from the
    precommit (or limit to Dataflow runner changes) even after they
    are fixed.

    On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <[email protected]
    <mailto:[email protected]>> wrote:

        The failure is related due to data that is associated with the
        apache-beam-testing project which is impacting all the
        Dataflow streaming tests.

        Yes, disabling the tests should have happened weeks ago if:
        1) The fix seemed like it was going to take a long time (was
        unknown at the time)
        2) We had confidence in test coverage minus Dataflow streaming
        test coverage (which I believe we did)



        On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
        <[email protected] <mailto:[email protected]>> wrote:

            Or if a rollback won't fix this, can we disable the broken
            tests?

            On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
            <[email protected] <mailto:[email protected]>> wrote:

                So you can roll back in two hours. Beam has been
                broken for two weeks. Why isn't a rollback appropriate?

                On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
                <[email protected] <mailto:[email protected]>> wrote:

                    From the test failures that I have seen they have
                    been because of BEAM-12676[1] which is due to a
                    bug impacting Dataflow streaming pipelines for the
                    apache-beam-testing project. The fix is rolling
                    out now from my understanding and should take
                    another 2hrs or so. Rolling back master doesn't
                    seem like what we should be doing at the moment.

                    1:
                    
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
                    
<https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>

                    On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud
                    <[email protected] <mailto:[email protected]>>
                    wrote:

                        Both java and python precommits are reporting
                        the last successful run being in July (for
                        both Cron and Precommit), so it looks like
                        changes are being submitting without
                        successful test runs. We probably shouldn't be
                        doing that?
                        
https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
                        
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
                        
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
                        
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
                        
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
                        
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
                        
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
                        
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>

                        Is there a plan to get this fixed? Should we
                        roll master back to July?

                        On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton
                        <[email protected]
                        <mailto:[email protected]>> wrote:

                            I only realized after sending that I used
                            the IP for the link, that was by accident,
                            here is the proper domain link:
                            
http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                            
<http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>

                            On Tue, Aug 3, 2021 at 3:22 PM Tyson
                            Hamilton <[email protected]
                            <mailto:[email protected]>> wrote:

                                The way I've investigated precommit
                                flake stability is by looking at the
                                'Post-commit Test Reliability' [1]
                                dashboard (hah!). There is a cron job
                                that runs precommits and those results
                                are tracked in the post commit
                                dashboard confusingly. This week, Java
                                is about 50% green for the pre-commit
                                cron job, not great.

                                The plugin we installed for tracking
                                the most flaky tests for a job doesn't
                                do well for the number of tests
                                present in the precommit cron job.
                                This could be an area of improvement
                                to help add granularity and visibility
                                to the flakiest tests over some period
                                of time.


                                [1]:
                                
http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                                
<http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
                                 (look for "PreCommit_Java_Cron")

                                On Tue, Aug 3, 2021 at 2:24 PM Andrew
                                Pilloud <[email protected]
                                <mailto:[email protected]>> wrote:

                                    Our metrics show java is nearly
                                    free from flakes, that go has
                                    significant flakes, and that
                                    python is effectively broken. It
                                    appears they may be missing
                                    coverage on the Java side. The
                                    dashboard is here:
                                    
http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
                                    
<http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>


                                    I agree that this is important to
                                    address. I haven't submitted any
                                    code recently but I spent a
                                    significant amount of time on the
                                    2.31.0 release investigating
                                    flakes in the release
                                    validation tests.

                                    Andrew

                                    On Tue, Aug 3, 2021 at 10:43 AM
                                    Reuven Lax <[email protected]
                                    <mailto:[email protected]>> wrote:

                                        I've noticed recently that our
                                        precommit tests are getting
                                        flakier and flakier. Recently
                                        I had to run Java PreCommit 5
                                        times before I was able to get
                                        a clean run. This is
                                        frustrating for us as
                                        developers, but it also is
                                        extremely wasteful of our
                                        compute resources.

                                        I started making a list of the
                                        flaky tests I've seen. Here
                                        are some of the ones I've
                                        dealt with just the past few
                                        days; this is not nearly an
                                        exhaustive list - I've seen
                                        many others before I started
                                        recording them. Of the below,
                                        failures in
                                        ElasticsearchIOTest are by far
                                        the most common!

                                        We need to try and make these
                                        tests not flaky. Barring that,
                                        I think the extremely flaky
                                        tests need to be excluded from
                                        our presubmit until they can
                                        be fixed. Rerunning the
                                        precommit over and over again
                                        till green is not a good
                                        testing strategy.

                                         *

                                            
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
                                            false]
                                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>

                                         *

                                        
org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
                                        
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>

                                         *

                                            
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
                                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>

                                         *

                                            
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
                                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>

                                         *

                                            
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
                                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>

Re: Flaky tests in Beam

Reply via email to