Hi all,

Here's the next batch of flaky test issues:

#9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource
<https://github.com/apache/pulsar/issues/9459>

#9458 Flaky-test: ReplicatorTest.testReplication
<https://github.com/apache/pulsar/issues/9458>

#9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic
<https://github.com/apache/pulsar/issues/9457>

#9456 Flaky-test: TestProxy <https://github.com/apache/pulsar/issues/9456>

#9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction
<https://github.com/apache/pulsar/issues/9455>

#9454 Flaky-test: CLITest.testCreateSubscriptionCommand
<https://github.com/apache/pulsar/issues/9454>

#9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction
<https://github.com/apache/pulsar/issues/9453>

#9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup
<https://github.com/apache/pulsar/issues/9452>

#9451 Flaky-test:
SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
<https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test:
org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
<https://github.com/apache/pulsar/issues/9450>

The ReplicatorTest (
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java)
is contributing to a lot of failures, here's a complete list of example
failures: https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd
. It would be one of the fixes that would have really great impact. I filed
2 issues about ReplicatorTest.

Keep up the good work in fixing flaky tests. There's again a lot of great
contributions. Thank you!

BR, Lari




On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <lari.hot...@sagire.fi> wrote:

> Hi all,
>
> There are links to recent failures of a particular flaky test in the
> recently reported flaky test GitHub issues (
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
> ).
>
> Example from https://github.com/apache/pulsar/issues/9437 :
> example failure 2021-02-01T09:41:10.0922161Z
> <https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322>
> example failure 2021-01-29T07:51:57.9989389Z
> <https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491>
> example failure 2021-01-28T02:42:14.3316285Z
> <https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415>
> example failure 2021-01-27T21:44:09.7619772Z
> <https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213>
>
> These links point to the exact line in the build log.
> For example:
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
> *step:6:12322*
>
> *When opening this link, it should navigate directly to the line number
> 12322 in step 6 of the workflow run log.*
>
> However, there's a bug in the GitHub UI, that this doesn't work if
> the link is clicked from a page within github.com .
> The parameters and hash of the URL get lost and the focus doesn't go to
> the line where the error happened.
>
> *The workaround is to open the "example failure" links in a new tab/window
> by CTRL-click (Windows, Linux) or CMD-click (macOS).*
>
> I hope this helps investigate the flaky test failures more efficiently!
>
> BR,
>
> Lari
>
> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <lari.hot...@sagire.fi> wrote:
>
>> The good progress continues!
>> One way to see the issue & PR activity where "flaky" is mentioned:
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
>> Thank you to the contributors and PR reviewers!
>>
>> Here's the next flaky test for someone to fix:
>> https://github.com/apache/pulsar/issues/6646 (reported a long time ago,
>> I added some example of recent failures)
>> It's about PulsarFunctionsTest. This test class contributes to a lot of
>> failures. I have uploaded a list of failures to
>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
>> I haven't validated that all failures are from flaky test runs. It's
>> possible that some are from a build which broke the test.
>>
>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
>> https://github.com/apache/pulsar/issues/6646 ? You can comment directly
>> on issue #6646 and start working on it if you wish. It would be a really
>> important fix to have.
>>
>> 2) Another one: https://github.com/apache/pulsar/issues/9431
>>
>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
>> https://github.com/apache/pulsar/issues/9432
>>
>> I'm looking for the sprinting to continue. It seems that the issues get
>> fixed sooner than I can report more of them. :)
>>
>> BR, Lari
>>
>>
>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <lari.hot...@sagire.fi> wrote:
>>
>>> Dear Pulsar community members,
>>>
>>> Thanks for picking up the work so quickly! I noticed that at least
>>> Renkai and Michael already pushed pull requests to fix the flaky tests that
>>> were mentioned in the previous email. Some of the PRs have already been
>>> merged.
>>>
>>> Here are 3 more flaky tests with links to a lot of example failures:
>>> https://github.com/apache/pulsar/issues/9407
>>> https://github.com/apache/pulsar/issues/9408
>>> https://github.com/apache/pulsar/issues/9409
>>>
>>> I'll report more flaky tests tomorrow. Today I was working on some
>>> tooling to mine the logs and gather some statistics.
>>>
>>> I parsed the logs of the few last days and these are the test methods
>>> that fail the most:
>>>
>>> 273
>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
>>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
>>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
>>> 51
>>>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
>>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
>>> 40
>>>  
>>> org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
>>> 36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
>>> 30
>>>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
>>> 30
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
>>> 29
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
>>> 27
>>>  
>>> org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>> 26
>>>  
>>> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
>>> 22
>>>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
>>> 22
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
>>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
>>> 20
>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
>>> 20
>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
>>> 19
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
>>> 19
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
>>> 14
>>>  
>>> org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>> 14
>>>  
>>> org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
>>> 14
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
>>> 13
>>>  
>>> org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
>>> 12
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
>>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
>>> 12
>>>  
>>> org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
>>> 12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
>>> 12
>>>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
>>> 12
>>>  
>>> org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
>>> 11
>>>  
>>> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
>>> 11
>>>  
>>> org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>>>
>>> I'll report more flaky tests after I have checked that my tooling is
>>> producing correct results.
>>>
>>> For contributing to fix flaky tests, please pick a flaky test for fixing
>>> from the reported ones:
>>>
>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>
>>> We can all join the #testing channel on Pulsar Slack to share detailed
>>> tips and tricks while working on fixing flaky tests.
>>>
>>> See you,
>>>
>>> BR, Lari
>>>
>>>
>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <lari.hot...@sagire.fi>
>>> wrote:
>>>
>>>> Dear Pulsar community members,
>>>>
>>>> In order to improve our CI, we will have to fix the flaky tests. In
>>>> some cases it might be necessary to replace an existing test with a
>>>> redesigned test.
>>>>
>>>> The draft PIP "Changes to flaky test handling" document
>>>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
>>>>  lists
>>>> the top 10 flaky tests. A lot of them have already been address by pull
>>>> requests in the past week or so.
>>>>
>>>> This is the list of recent PRs that fix flaky tests from the top 10
>>>> flaky tests list:
>>>> https://github.com/apache/pulsar/pull/9286
>>>> https://github.com/apache/pulsar/pull/9243
>>>> https://github.com/apache/pulsar/pull/9258
>>>> https://github.com/apache/pulsar/pull/9356
>>>>
>>>> These are the GH issues for the remaining ones in the top 10 flaky
>>>> tests list:
>>>> https://github.com/apache/pulsar/issues/6368
>>>> https://github.com/apache/pulsar/issues/9369
>>>> https://github.com/apache/pulsar/issues/9368
>>>>
>>>> If you would like to help to fix flaky tests you can pick one of the
>>>> open issues above. Just add a comment on the issue when you start working
>>>> on it so that we can coordinate activities.
>>>>
>>>> It is also helpful to report a flaky test when you encounter one. I've
>>>> been using this type of template for reporting a flaky test:
>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
>>>> issues #9368 and #9369 have been reported using this template.
>>>> Search for the test name before reporting so that we don't end up with
>>>> duplicates.
>>>>
>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
>>>> fix. I'm planning to create a more extensive list of the flaky failures so
>>>> that we can target the most flaky ones when we continue fixing the flaky
>>>> tests. I have some scripts in development to assist in mining the Pulsar
>>>> Github Action workflow run logs.
>>>>
>>>> This is a search to find flaky issues in Pulsar GH issues:
>>>>
>>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>>
>>>> Looking forward to the contributions for fixing flaky tests,
>>>>
>>>> BR,
>>>>
>>>> Lari
>>>>
>>>

Reply via email to