lhotari commented on pull request #9393: URL: https://github.com/apache/pulsar/pull/9393#issuecomment-771401543
> Perhaps it is worth bumping the limit to something like 300 in this one case? If we only have 2 cores, we won't exceed that. Although, that does leave us with the potential to see this flakiness again if we ever give the test more cores. I guess we would have to experiment to find a solution that makes sense. I don't know this area of Pulsar so I could give a direct advice. > Do you think the broker's cpu utilization is high because they are still in the process of starting up? If so, perhaps your suggested `await` command could help by giving the brokers time to stabilize. My assumption was simply that the test might start executing before both brokers are available. That has been a source of flakiness at least in DiscoveryServiceTest. I didn't confirm this assumption in any way. A simple approach would be to experiment and check whether the problem reproduces after making changes. Being able to experiment, would require having ways to reproduce the issue in environment where you can quickly run experiments. btw. I have published my "toolbox" as open source in https://github.com/lhotari/pulsar-contributor-toolbox . That contains shell script functions that I use for reproducing Pulsar flaky test failures. It works for me when I use Linux, zsh & sdkman (for JDK installation). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
