Hi all, Here's the next batch of flaky test issues:
#9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource <https://github.com/apache/pulsar/issues/9459> #9458 Flaky-test: ReplicatorTest.testReplication <https://github.com/apache/pulsar/issues/9458> #9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic <https://github.com/apache/pulsar/issues/9457> #9456 Flaky-test: TestProxy <https://github.com/apache/pulsar/issues/9456> #9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction <https://github.com/apache/pulsar/issues/9455> #9454 Flaky-test: CLITest.testCreateSubscriptionCommand <https://github.com/apache/pulsar/issues/9454> #9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction <https://github.com/apache/pulsar/issues/9453> #9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup <https://github.com/apache/pulsar/issues/9452> #9451 Flaky-test: SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect <https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test: org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail <https://github.com/apache/pulsar/issues/9450> The ReplicatorTest ( https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java) is contributing to a lot of failures, here's a complete list of example failures: https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd . It would be one of the fixes that would have really great impact. I filed 2 issues about ReplicatorTest. Keep up the good work in fixing flaky tests. There's again a lot of great contributions. Thank you! BR, Lari On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <lari.hot...@sagire.fi> wrote: > Hi all, > > There are links to recent failures of a particular flaky test in the > recently reported flaky test GitHub issues ( > https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen > ). > > Example from https://github.com/apache/pulsar/issues/9437 : > example failure 2021-02-01T09:41:10.0922161Z > <https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322> > example failure 2021-01-29T07:51:57.9989389Z > <https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491> > example failure 2021-01-28T02:42:14.3316285Z > <https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415> > example failure 2021-01-27T21:44:09.7619772Z > <https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213> > > These links point to the exact line in the build log. > For example: > https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true# > *step:6:12322* > > *When opening this link, it should navigate directly to the line number > 12322 in step 6 of the workflow run log.* > > However, there's a bug in the GitHub UI, that this doesn't work if > the link is clicked from a page within github.com . > The parameters and hash of the URL get lost and the focus doesn't go to > the line where the error happened. > > *The workaround is to open the "example failure" links in a new tab/window > by CTRL-click (Windows, Linux) or CMD-click (macOS).* > > I hope this helps investigate the flaky test failures more efficiently! > > BR, > > Lari > > On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <lari.hot...@sagire.fi> wrote: > >> The good progress continues! >> One way to see the issue & PR activity where "flaky" is mentioned: >> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc >> Thank you to the contributors and PR reviewers! >> >> Here's the next flaky test for someone to fix: >> https://github.com/apache/pulsar/issues/6646 (reported a long time ago, >> I added some example of recent failures) >> It's about PulsarFunctionsTest. This test class contributes to a lot of >> failures. I have uploaded a list of failures to >> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 . >> I haven't validated that all failures are from flaky test runs. It's >> possible that some are from a build which broke the test. >> >> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest, >> https://github.com/apache/pulsar/issues/6646 ? You can comment directly >> on issue #6646 and start working on it if you wish. It would be a really >> important fix to have. >> >> 2) Another one: https://github.com/apache/pulsar/issues/9431 >> >> 3) The 3rd one might be a quick fix, it's a NPE in cleanup: >> https://github.com/apache/pulsar/issues/9432 >> >> I'm looking for the sprinting to continue. It seems that the issues get >> fixed sooner than I can report more of them. :) >> >> BR, Lari >> >> >> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi> wrote: >> >>> Dear Pulsar community members, >>> >>> Thanks for picking up the work so quickly! I noticed that at least >>> Renkai and Michael already pushed pull requests to fix the flaky tests that >>> were mentioned in the previous email. Some of the PRs have already been >>> merged. >>> >>> Here are 3 more flaky tests with links to a lot of example failures: >>> https://github.com/apache/pulsar/issues/9407 >>> https://github.com/apache/pulsar/issues/9408 >>> https://github.com/apache/pulsar/issues/9409 >>> >>> I'll report more flaky tests tomorrow. Today I was working on some >>> tooling to mine the logs and gather some statistics. >>> >>> I parsed the logs of the few last days and these are the test methods >>> that fail the most: >>> >>> 273 >>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete >>> 102 org.apache.pulsar.compaction.CompactionTest.cleanup >>> 81 org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics >>> 51 >>> org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection >>> 45 org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown >>> 40 >>> >>> org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions >>> 36 org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup >>> 30 >>> org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown >>> 30 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction >>> 29 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction >>> 27 >>> >>> org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect >>> 26 >>> >>> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3 >>> 22 >>> org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail >>> 22 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction >>> 21 org.apache.pulsar.tests.integration.SmokeTest.setup >>> 20 >>> org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection >>> 20 >>> org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability >>> 19 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun >>> 19 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction >>> 14 >>> >>> org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect >>> 14 >>> >>> org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting >>> 14 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest >>> 13 >>> >>> org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck >>> 12 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction >>> 12 org.apache.pulsar.compaction.CompactorTest.cleanup >>> 12 >>> >>> org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest >>> 12 org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup >>> 12 >>> org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup >>> 12 >>> >>> org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription >>> 11 >>> >>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction >>> 11 >>> >>> org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup >>> >>> I'll report more flaky tests after I have checked that my tooling is >>> producing correct results. >>> >>> For contributing to fix flaky tests, please pick a flaky test for fixing >>> from the reported ones: >>> >>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen >>> >>> We can all join the #testing channel on Pulsar Slack to share detailed >>> tips and tricks while working on fixing flaky tests. >>> >>> See you, >>> >>> BR, Lari >>> >>> >>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi> >>> wrote: >>> >>>> Dear Pulsar community members, >>>> >>>> In order to improve our CI, we will have to fix the flaky tests. In >>>> some cases it might be necessary to replace an existing test with a >>>> redesigned test. >>>> >>>> The draft PIP "Changes to flaky test handling" document >>>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> >>>> lists >>>> the top 10 flaky tests. A lot of them have already been address by pull >>>> requests in the past week or so. >>>> >>>> This is the list of recent PRs that fix flaky tests from the top 10 >>>> flaky tests list: >>>> https://github.com/apache/pulsar/pull/9286 >>>> https://github.com/apache/pulsar/pull/9243 >>>> https://github.com/apache/pulsar/pull/9258 >>>> https://github.com/apache/pulsar/pull/9356 >>>> >>>> These are the GH issues for the remaining ones in the top 10 flaky >>>> tests list: >>>> https://github.com/apache/pulsar/issues/6368 >>>> https://github.com/apache/pulsar/issues/9369 >>>> https://github.com/apache/pulsar/issues/9368 >>>> >>>> If you would like to help to fix flaky tests you can pick one of the >>>> open issues above. Just add a comment on the issue when you start working >>>> on it so that we can coordinate activities. >>>> >>>> It is also helpful to report a flaky test when you encounter one. I've >>>> been using this type of template for reporting a flaky test: >>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The >>>> issues #9368 and #9369 have been reported using this template. >>>> Search for the test name before reporting so that we don't end up with >>>> duplicates. >>>> >>>> The issues #6368, #9369 and #9368 are the 3 next important issues to >>>> fix. I'm planning to create a more extensive list of the flaky failures so >>>> that we can target the most flaky ones when we continue fixing the flaky >>>> tests. I have some scripts in development to assist in mining the Pulsar >>>> Github Action workflow run logs. >>>> >>>> This is a search to find flaky issues in Pulsar GH issues: >>>> >>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen >>>> >>>> Looking forward to the contributions for fixing flaky tests, >>>> >>>> BR, >>>> >>>> Lari >>>> >>>