This is great news!

On Fri, 12 Feb 2021 at 13:20, Lari Hotari <lari.hot...@sagire.fi> wrote:

> Hi all,
>
> There has been some great progress in fixing the flaky tests. It seems that
> there's more stability in the builds after more fixes have been merged to
> master.
> This work has an impact. Thank you for the contributions.
>
> Our work is not over. There's a lot more to fix. Please continue
> contributing to make Pulsar CI better.
>
> Here's the list of open issues:
>
> https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+Flaky-test+sort%3Aupdated-desc
>
> As usual, please comment on the issue to assign it to yourself.
> You can join Pulsar Slack's #testing channel to share tips & tricks around
> fixing the flaky tests or for asking questions.
>
> Keep up the good work!
>
> BR, Lari
>
> On Wed, Feb 3, 2021 at 9:07 PM Lari Hotari <lari.hot...@sagire.fi> wrote:
>
> > Hi all,
> >
> > Here's the next batch of flaky test issues:
> >
> > #9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource
> > <https://github.com/apache/pulsar/issues/9459>
> >
> > #9458 Flaky-test: ReplicatorTest.testReplication
> > <https://github.com/apache/pulsar/issues/9458>
> >
> > #9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic
> > <https://github.com/apache/pulsar/issues/9457>
> >
> > #9456 Flaky-test: TestProxy <
> https://github.com/apache/pulsar/issues/9456>
> >
> > #9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction
> > <https://github.com/apache/pulsar/issues/9455>
> >
> > #9454 Flaky-test: CLITest.testCreateSubscriptionCommand
> > <https://github.com/apache/pulsar/issues/9454>
> >
> > #9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction
> > <https://github.com/apache/pulsar/issues/9453>
> >
> > #9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup
> > <https://github.com/apache/pulsar/issues/9452>
> >
> > #9451 Flaky-test:
> > SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> > <https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test:
> > org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> > <https://github.com/apache/pulsar/issues/9450>
> >
> > The ReplicatorTest (
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java
> )
> > is contributing to a lot of failures, here's a complete list of example
> > failures:
> https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd
> > . It would be one of the fixes that would have really great impact. I
> filed
> > 2 issues about ReplicatorTest.
> >
> > Keep up the good work in fixing flaky tests. There's again a lot of great
> > contributions. Thank you!
> >
> > BR, Lari
> >
> >
> >
> >
> > On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <lari.hot...@sagire.fi>
> wrote:
> >
> >> Hi all,
> >>
> >> There are links to recent failures of a particular flaky test in the
> >> recently reported flaky test GitHub issues (
> >>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
> >> ).
> >>
> >> Example from https://github.com/apache/pulsar/issues/9437 :
> >> example failure 2021-02-01T09:41:10.0922161Z
> >> <
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322
> >
> >> example failure 2021-01-29T07:51:57.9989389Z
> >> <
> https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491
> >
> >> example failure 2021-01-28T02:42:14.3316285Z
> >> <
> https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415
> >
> >> example failure 2021-01-27T21:44:09.7619772Z
> >> <
> https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213
> >
> >>
> >> These links point to the exact line in the build log.
> >> For example:
> >>
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
> >> *step:6:12322*
> >>
> >> *When opening this link, it should navigate directly to the line number
> >> 12322 in step 6 of the workflow run log.*
> >>
> >> However, there's a bug in the GitHub UI, that this doesn't work if
> >> the link is clicked from a page within github.com .
> >> The parameters and hash of the URL get lost and the focus doesn't go to
> >> the line where the error happened.
> >>
> >> *The workaround is to open the "example failure" links in a new
> >> tab/window by CTRL-click (Windows, Linux) or CMD-click (macOS).*
> >>
> >> I hope this helps investigate the flaky test failures more efficiently!
> >>
> >> BR,
> >>
> >> Lari
> >>
> >> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <lari.hot...@sagire.fi>
> wrote:
> >>
> >>> The good progress continues!
> >>> One way to see the issue & PR activity where "flaky" is mentioned:
> >>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
> >>> Thank you to the contributors and PR reviewers!
> >>>
> >>> Here's the next flaky test for someone to fix:
> >>> https://github.com/apache/pulsar/issues/6646 (reported a long time
> ago,
> >>> I added some example of recent failures)
> >>> It's about PulsarFunctionsTest. This test class contributes to a lot of
> >>> failures. I have uploaded a list of failures to
> >>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
> >>> I haven't validated that all failures are from flaky test runs. It's
> >>> possible that some are from a build which broke the test.
> >>>
> >>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
> >>> https://github.com/apache/pulsar/issues/6646 ? You can comment
> directly
> >>> on issue #6646 and start working on it if you wish. It would be a
> really
> >>> important fix to have.
> >>>
> >>> 2) Another one: https://github.com/apache/pulsar/issues/9431
> >>>
> >>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
> >>> https://github.com/apache/pulsar/issues/9432
> >>>
> >>> I'm looking for the sprinting to continue. It seems that the issues get
> >>> fixed sooner than I can report more of them. :)
> >>>
> >>> BR, Lari
> >>>
> >>>
> >>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi>
> >>> wrote:
> >>>
> >>>> Dear Pulsar community members,
> >>>>
> >>>> Thanks for picking up the work so quickly! I noticed that at least
> >>>> Renkai and Michael already pushed pull requests to fix the flaky
> tests that
> >>>> were mentioned in the previous email. Some of the PRs have already
> been
> >>>> merged.
> >>>>
> >>>> Here are 3 more flaky tests with links to a lot of example failures:
> >>>> https://github.com/apache/pulsar/issues/9407
> >>>> https://github.com/apache/pulsar/issues/9408
> >>>> https://github.com/apache/pulsar/issues/9409
> >>>>
> >>>> I'll report more flaky tests tomorrow. Today I was working on some
> >>>> tooling to mine the logs and gather some statistics.
> >>>>
> >>>> I parsed the logs of the few last days and these are the test methods
> >>>> that fail the most:
> >>>>
> >>>> 273
> >>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
> >>>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
> >>>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
> >>>> 51
> >>>>
> org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
> >>>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
> >>>> 40
> >>>>
> org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
> >>>> 36
> >>>>  org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
> >>>> 30
> >>>>
> org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
> >>>> 30
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
> >>>> 29
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
> >>>> 27
> >>>>
> org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> >>>> 26
> >>>>
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
> >>>> 22
> >>>>
> org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> >>>> 22
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
> >>>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
> >>>> 20
> >>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
> >>>> 20
> >>>>
> org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
> >>>> 19
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
> >>>> 19
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
> >>>> 14
> >>>>
> org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> >>>> 14
> >>>>
> org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
> >>>> 14
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
> >>>> 13
> >>>>
> org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
> >>>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
> >>>> 12
> >>>>  org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
> >>>> 11
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
> >>>> 11
> >>>>
> org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
> >>>>
> >>>> I'll report more flaky tests after I have checked that my tooling is
> >>>> producing correct results.
> >>>>
> >>>> For contributing to fix flaky tests, please pick a flaky test for
> >>>> fixing from the reported ones:
> >>>>
> >>>>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
> >>>>
> >>>> We can all join the #testing channel on Pulsar Slack to share detailed
> >>>> tips and tricks while working on fixing flaky tests.
> >>>>
> >>>> See you,
> >>>>
> >>>> BR, Lari
> >>>>
> >>>>
> >>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi>
> >>>> wrote:
> >>>>
> >>>>> Dear Pulsar community members,
> >>>>>
> >>>>> In order to improve our CI, we will have to fix the flaky tests. In
> >>>>> some cases it might be necessary to replace an existing test with a
> >>>>> redesigned test.
> >>>>>
> >>>>> The draft PIP "Changes to flaky test handling" document
> >>>>> <
> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
> lists
> >>>>> the top 10 flaky tests. A lot of them have already been address by
> pull
> >>>>> requests in the past week or so.
> >>>>>
> >>>>> This is the list of recent PRs that fix flaky tests from the top 10
> >>>>> flaky tests list:
> >>>>> https://github.com/apache/pulsar/pull/9286
> >>>>> https://github.com/apache/pulsar/pull/9243
> >>>>> https://github.com/apache/pulsar/pull/9258
> >>>>> https://github.com/apache/pulsar/pull/9356
> >>>>>
> >>>>> These are the GH issues for the remaining ones in the top 10 flaky
> >>>>> tests list:
> >>>>> https://github.com/apache/pulsar/issues/6368
> >>>>> https://github.com/apache/pulsar/issues/9369
> >>>>> https://github.com/apache/pulsar/issues/9368
> >>>>>
> >>>>> If you would like to help to fix flaky tests you can pick one of the
> >>>>> open issues above. Just add a comment on the issue when you start
> working
> >>>>> on it so that we can coordinate activities.
> >>>>>
> >>>>> It is also helpful to report a flaky test when you encounter one.
> I've
> >>>>> been using this type of template for reporting a flaky test:
> >>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 .
> >>>>> The issues #9368 and #9369 have been reported using this template.
> >>>>> Search for the test name before reporting so that we don't end up
> with
> >>>>> duplicates.
> >>>>>
> >>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
> >>>>> fix. I'm planning to create a more extensive list of the flaky
> failures so
> >>>>> that we can target the most flaky ones when we continue fixing the
> flaky
> >>>>> tests. I have some scripts in development to assist in mining the
> Pulsar
> >>>>> Github Action workflow run logs.
> >>>>>
> >>>>> This is a search to find flaky issues in Pulsar GH issues:
> >>>>>
> >>>>>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
> >>>>>
> >>>>> Looking forward to the contributions for fixing flaky tests,
> >>>>>
> >>>>> BR,
> >>>>>
> >>>>> Lari
> >>>>>
> >>>>
>


-- 
*Thanks*

*Yuvaraj L*

Reply via email to