lhotari commented on pull request #9393:
URL: https://github.com/apache/pulsar/pull/9393#issuecomment-771401543


   > Perhaps it is worth bumping the limit to something like 300 in this one 
case? If we only have 2 cores, we won't exceed that. Although, that does leave 
us with the potential to see this flakiness again if we ever give the test more 
cores.
   
   I guess we would have to experiment to find a solution that makes sense. I 
don't know this area of Pulsar so I could give a direct advice. 
   
   
   > Do you think the broker's cpu utilization is high because they are still 
in the process of starting up? If so, perhaps your suggested `await` command 
could help by giving the brokers time to stabilize.
   
   My assumption was simply that the test might start executing before both 
brokers are available. That has been a source of flakiness at least in 
DiscoveryServiceTest. I didn't confirm this assumption in any way. A simple 
approach would be to experiment and check whether the problem reproduces after 
making changes. Being able to experiment, would require having ways to 
reproduce the issue in environment where you can quickly run experiments. 
   
   btw. I have published my "toolbox" as open source in 
https://github.com/lhotari/pulsar-contributor-toolbox . That contains shell 
script functions that I use for reproducing Pulsar flaky test failures. It 
works for me when I use Linux, zsh & sdkman (for JDK installation).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to