This is great news! On Fri, 12 Feb 2021 at 13:20, Lari Hotari <lari.hot...@sagire.fi> wrote:
> Hi all, > > There has been some great progress in fixing the flaky tests. It seems that > there's more stability in the builds after more fixes have been merged to > master. > This work has an impact. Thank you for the contributions. > > Our work is not over. There's a lot more to fix. Please continue > contributing to make Pulsar CI better. > > Here's the list of open issues: > > https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+Flaky-test+sort%3Aupdated-desc > > As usual, please comment on the issue to assign it to yourself. > You can join Pulsar Slack's #testing channel to share tips & tricks around > fixing the flaky tests or for asking questions. > > Keep up the good work! > > BR, Lari > > On Wed, Feb 3, 2021 at 9:07 PM Lari Hotari <lari.hot...@sagire.fi> wrote: > > > Hi all, > > > > Here's the next batch of flaky test issues: > > > > #9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource > > <https://github.com/apache/pulsar/issues/9459> > > > > #9458 Flaky-test: ReplicatorTest.testReplication > > <https://github.com/apache/pulsar/issues/9458> > > > > #9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic > > <https://github.com/apache/pulsar/issues/9457> > > > > #9456 Flaky-test: TestProxy < > https://github.com/apache/pulsar/issues/9456> > > > > #9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction > > <https://github.com/apache/pulsar/issues/9455> > > > > #9454 Flaky-test: CLITest.testCreateSubscriptionCommand > > <https://github.com/apache/pulsar/issues/9454> > > > > #9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction > > <https://github.com/apache/pulsar/issues/9453> > > > > #9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup > > <https://github.com/apache/pulsar/issues/9452> > > > > #9451 Flaky-test: > > SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect > > <https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test: > > org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail > > <https://github.com/apache/pulsar/issues/9450> > > > > The ReplicatorTest ( > > > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java > ) > > is contributing to a lot of failures, here's a complete list of example > > failures: > https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd > > . It would be one of the fixes that would have really great impact. I > filed > > 2 issues about ReplicatorTest. > > > > Keep up the good work in fixing flaky tests. There's again a lot of great > > contributions. Thank you! > > > > BR, Lari > > > > > > > > > > On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <lari.hot...@sagire.fi> > wrote: > > > >> Hi all, > >> > >> There are links to recent failures of a particular flaky test in the > >> recently reported flaky test GitHub issues ( > >> > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen > >> ). > >> > >> Example from https://github.com/apache/pulsar/issues/9437 : > >> example failure 2021-02-01T09:41:10.0922161Z > >> < > https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322 > > > >> example failure 2021-01-29T07:51:57.9989389Z > >> < > https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491 > > > >> example failure 2021-01-28T02:42:14.3316285Z > >> < > https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415 > > > >> example failure 2021-01-27T21:44:09.7619772Z > >> < > https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213 > > > >> > >> These links point to the exact line in the build log. > >> For example: > >> > https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true# > >> *step:6:12322* > >> > >> *When opening this link, it should navigate directly to the line number > >> 12322 in step 6 of the workflow run log.* > >> > >> However, there's a bug in the GitHub UI, that this doesn't work if > >> the link is clicked from a page within github.com . > >> The parameters and hash of the URL get lost and the focus doesn't go to > >> the line where the error happened. > >> > >> *The workaround is to open the "example failure" links in a new > >> tab/window by CTRL-click (Windows, Linux) or CMD-click (macOS).* > >> > >> I hope this helps investigate the flaky test failures more efficiently! > >> > >> BR, > >> > >> Lari > >> > >> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <lari.hot...@sagire.fi> > wrote: > >> > >>> The good progress continues! > >>> One way to see the issue & PR activity where "flaky" is mentioned: > >>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc > >>> Thank you to the contributors and PR reviewers! > >>> > >>> Here's the next flaky test for someone to fix: > >>> https://github.com/apache/pulsar/issues/6646 (reported a long time > ago, > >>> I added some example of recent failures) > >>> It's about PulsarFunctionsTest. This test class contributes to a lot of > >>> failures. I have uploaded a list of failures to > >>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 . > >>> I haven't validated that all failures are from flaky test runs. It's > >>> possible that some are from a build which broke the test. > >>> > >>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest, > >>> https://github.com/apache/pulsar/issues/6646 ? You can comment > directly > >>> on issue #6646 and start working on it if you wish. It would be a > really > >>> important fix to have. > >>> > >>> 2) Another one: https://github.com/apache/pulsar/issues/9431 > >>> > >>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup: > >>> https://github.com/apache/pulsar/issues/9432 > >>> > >>> I'm looking for the sprinting to continue. It seems that the issues get > >>> fixed sooner than I can report more of them. :) > >>> > >>> BR, Lari > >>> > >>> > >>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi> > >>> wrote: > >>> > >>>> Dear Pulsar community members, > >>>> > >>>> Thanks for picking up the work so quickly! I noticed that at least > >>>> Renkai and Michael already pushed pull requests to fix the flaky > tests that > >>>> were mentioned in the previous email. Some of the PRs have already > been > >>>> merged. > >>>> > >>>> Here are 3 more flaky tests with links to a lot of example failures: > >>>> https://github.com/apache/pulsar/issues/9407 > >>>> https://github.com/apache/pulsar/issues/9408 > >>>> https://github.com/apache/pulsar/issues/9409 > >>>> > >>>> I'll report more flaky tests tomorrow. Today I was working on some > >>>> tooling to mine the logs and gather some statistics. > >>>> > >>>> I parsed the logs of the few last days and these are the test methods > >>>> that fail the most: > >>>> > >>>> 273 > >>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete > >>>> 102 org.apache.pulsar.compaction.CompactionTest.cleanup > >>>> 81 org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics > >>>> 51 > >>>> > org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection > >>>> 45 org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown > >>>> 40 > >>>> > org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions > >>>> 36 > >>>> org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup > >>>> 30 > >>>> > org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown > >>>> 30 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction > >>>> 29 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction > >>>> 27 > >>>> > org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect > >>>> 26 > >>>> > org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3 > >>>> 22 > >>>> > org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail > >>>> 22 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction > >>>> 21 org.apache.pulsar.tests.integration.SmokeTest.setup > >>>> 20 > >>>> org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection > >>>> 20 > >>>> > org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability > >>>> 19 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun > >>>> 19 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction > >>>> 14 > >>>> > org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect > >>>> 14 > >>>> > org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting > >>>> 14 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest > >>>> 13 > >>>> > org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck > >>>> 12 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction > >>>> 12 org.apache.pulsar.compaction.CompactorTest.cleanup > >>>> 12 > >>>> > org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest > >>>> 12 > >>>> org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup > >>>> 12 > >>>> > org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup > >>>> 12 > >>>> > org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription > >>>> 11 > >>>> > org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction > >>>> 11 > >>>> > org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup > >>>> > >>>> I'll report more flaky tests after I have checked that my tooling is > >>>> producing correct results. > >>>> > >>>> For contributing to fix flaky tests, please pick a flaky test for > >>>> fixing from the reported ones: > >>>> > >>>> > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen > >>>> > >>>> We can all join the #testing channel on Pulsar Slack to share detailed > >>>> tips and tricks while working on fixing flaky tests. > >>>> > >>>> See you, > >>>> > >>>> BR, Lari > >>>> > >>>> > >>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> > >>>> wrote: > >>>> > >>>>> Dear Pulsar community members, > >>>>> > >>>>> In order to improve our CI, we will have to fix the flaky tests. In > >>>>> some cases it might be necessary to replace an existing test with a > >>>>> redesigned test. > >>>>> > >>>>> The draft PIP "Changes to flaky test handling" document > >>>>> < > https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> > lists > >>>>> the top 10 flaky tests. A lot of them have already been address by > pull > >>>>> requests in the past week or so. > >>>>> > >>>>> This is the list of recent PRs that fix flaky tests from the top 10 > >>>>> flaky tests list: > >>>>> https://github.com/apache/pulsar/pull/9286 > >>>>> https://github.com/apache/pulsar/pull/9243 > >>>>> https://github.com/apache/pulsar/pull/9258 > >>>>> https://github.com/apache/pulsar/pull/9356 > >>>>> > >>>>> These are the GH issues for the remaining ones in the top 10 flaky > >>>>> tests list: > >>>>> https://github.com/apache/pulsar/issues/6368 > >>>>> https://github.com/apache/pulsar/issues/9369 > >>>>> https://github.com/apache/pulsar/issues/9368 > >>>>> > >>>>> If you would like to help to fix flaky tests you can pick one of the > >>>>> open issues above. Just add a comment on the issue when you start > working > >>>>> on it so that we can coordinate activities. > >>>>> > >>>>> It is also helpful to report a flaky test when you encounter one. > I've > >>>>> been using this type of template for reporting a flaky test: > >>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . > >>>>> The issues #9368 and #9369 have been reported using this template. > >>>>> Search for the test name before reporting so that we don't end up > with > >>>>> duplicates. > >>>>> > >>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to > >>>>> fix. I'm planning to create a more extensive list of the flaky > failures so > >>>>> that we can target the most flaky ones when we continue fixing the > flaky > >>>>> tests. I have some scripts in development to assist in mining the > Pulsar > >>>>> Github Action workflow run logs. > >>>>> > >>>>> This is a search to find flaky issues in Pulsar GH issues: > >>>>> > >>>>> > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen > >>>>> > >>>>> Looking forward to the contributions for fixing flaky tests, > >>>>> > >>>>> BR, > >>>>> > >>>>> Lari > >>>>> > >>>> > -- *Thanks* *Yuvaraj L*