Re: Flaky tests in Beam

Jan Lukavský Mon, 16 Aug 2021 09:52:15 -0700

The issue is with pull requests. IIRC, I didn't encounter this problem,but I can imagine, that a change in core can make Dataflow precommit tofail. And it would be complicated to fix this without GCP credentials.

So, to answer the question, I think that no, it would not help, as longas this flag would not be used in CI as well.


On 8/16/21 6:47 PM, Luke Cwik wrote:

Jan, it would be possible to add a flag that says to skip any IT teststhat require a cloud service of any kind. Would that work for you?

It turns out that the fix was rolled out and finished about 45 minsago so my prior e-mail was already out of date when I sent it. If youhad a test that failed on your PR, please feel free to restart thetest using the github trigger phrase associated with it.

I reran one of the suites that were perma-redhttps://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059>and it passed.

On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <[email protected]<mailto:[email protected]>> wrote:


    Not directly related to the 'flakiness' discussion of this thread,
    but I think it would be good if pre-commit checks could be run
    locally without GCP credentials.

    On 8/16/21 6:24 PM, Luke Cwik wrote:

    The fix was inadvertently run in dry run mode so didn't make any
    changes. Since the fix was taking a couple of hours or so and it
    was getting late on Friday people didn't want to start it again
    till today (after the weekend).

    I don't think removing the few tests that run an unbounded
    pipeline on Dataflow for a long term is a good idea. Sure, we can
    disable them and re-enable them when there is an issue that is
    blocking folks.

    On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud
    <[email protected] <mailto:[email protected]>> wrote:

        The two hours to estimated fix has long passed and we are now
        at 18 days since the last successful run. What is the latest
        estimate?

        It sounds like these tests are primarily testing
        Dataflow, not Beam. They seem like good candidates to remove
        from the precommit (or limit to Dataflow runner changes) even
        after they are fixed.

        On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <[email protected]
        <mailto:[email protected]>> wrote:

            The failure is related due to data that is associated
            with the apache-beam-testing project which is impacting
            all the Dataflow streaming tests.

            Yes, disabling the tests should have happened weeks ago if:
            1) The fix seemed like it was going to take a long time
            (was unknown at the time)
            2) We had confidence in test coverage minus Dataflow
            streaming test coverage (which I believe we did)



            On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
            <[email protected] <mailto:[email protected]>> wrote:

                Or if a rollback won't fix this, can we disable the
                broken tests?

                On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
                <[email protected] <mailto:[email protected]>> wrote:

                    So you can roll back in two hours. Beam has been
                    broken for two weeks. Why isn't a rollback
                    appropriate?

                    On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
                    <[email protected] <mailto:[email protected]>> wrote:

                        From the test failures that I have seen they
                        have been because of BEAM-12676[1] which is
                        due to a bug impacting Dataflow streaming
                        pipelines for the apache-beam-testing
                        project. The fix is rolling out now from my
                        understanding and should take another 2hrs or
                        so. Rolling back master doesn't seem like
                        what we should be doing at the moment.

                        1:
                        
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
                        
<https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>

                        On Fri, Aug 13, 2021 at 5:51 PM Andrew
                        Pilloud <[email protected]
                        <mailto:[email protected]>> wrote:

                            Both java and python precommits are
                            reporting the last successful run being
                            in July (for both Cron and Precommit), so
                            it looks like changes are being
                            submitting without successful test runs.
                            We probably shouldn't be doing that?
                            
https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
                            
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
                            
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
                            
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
                            
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
                            
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>

                            Is there a plan to get this fixed? Should
                            we roll master back to July?

                            On Tue, Aug 3, 2021 at 12:24 PM Tyson
                            Hamilton <[email protected]
                            <mailto:[email protected]>> wrote:

                                I only realized after sending that I
                                used the IP for the link, that was by
                                accident, here is the proper domain
                                link:
                                
http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                                
<http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>

                                On Tue, Aug 3, 2021 at 3:22 PM Tyson
                                Hamilton <[email protected]
                                <mailto:[email protected]>> wrote:

                                    The way I've investigated
                                    precommit flake stability is by
                                    looking at the 'Post-commit Test
                                    Reliability' [1] dashboard
                                    (hah!). There is a cron job that
                                    runs precommits and those results
                                    are tracked in the post commit
                                    dashboard confusingly. This week,
                                    Java is about 50% green for the
                                    pre-commit cron job, not great.

                                    The plugin we installed for
                                    tracking the most flaky tests for
                                    a job doesn't do well for the
                                    number of tests present in the
                                    precommit cron job. This could be
                                    an area of improvement to help
                                    add granularity and visibility to
                                    the flakiest tests over some
                                    period of time.


                                    [1]:
                                    
http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                                    
<http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
                                     (look for "PreCommit_Java_Cron")

                                    On Tue, Aug 3, 2021 at 2:24 PM
                                    Andrew Pilloud
                                    <[email protected]
                                    <mailto:[email protected]>> wrote:

                                        Our metrics show java is
                                        nearly free from flakes, that
                                        go has significant flakes,
                                        and that python is
                                        effectively broken. It
                                        appears they may be missing
                                        coverage on the Java side.
                                        The dashboard is here:
                                        
http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
                                        
<http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>


                                        I agree that this is
                                        important to address. I
                                        haven't submitted any code
                                        recently but I spent a
                                        significant amount of time on
                                        the 2.31.0 release
                                        investigating flakes in the
                                        release validation tests.

                                        Andrew

                                        On Tue, Aug 3, 2021 at 10:43
                                        AM Reuven Lax
                                        <[email protected]
                                        <mailto:[email protected]>> wrote:

                                            I've noticed recently
                                            that our precommit tests
                                            are getting flakier and
                                            flakier. Recently I had
                                            to run Java PreCommit 5
                                            times before I was able
                                            to get a clean run. This
                                            is frustrating for us as
                                            developers, but it also
                                            is extremely wasteful of
                                            our compute resources.

                                            I started making a list
                                            of the flaky tests I've
                                            seen. Here are some of
                                            the ones I've dealt with
                                            just the past few days;
                                            this is not nearly an
                                            exhaustive list - I've
                                            seen many others before I
                                            started recording them.
                                            Of the below, failures in
                                            ElasticsearchIOTest are
                                            by far the most common!

                                            We need to try and make
                                            these tests not flaky.
                                            Barring that, I think the
                                            extremely flaky tests
                                            need to be excluded from
                                            our presubmit until they
                                            can be fixed. Rerunning
                                            the precommit over and
                                            over again till green is
                                            not a good testing strategy.

                                             *

                                                
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
                                                false]
                                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>

                                             *

                                            
org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
                                            
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>

                                             *

                                                
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
                                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>

                                             *

                                                
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
                                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>

                                             *

                                                
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
                                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>

Re: Flaky tests in Beam

Reply via email to