lhotari commented on PR #21335:
URL: https://github.com/apache/pulsar/pull/21335#issuecomment-1754637243

   > I'm afraid that if we simply increase the waiting time, more resources are 
consumed and the CI can be more unstable. Do you have some insights that this 
issue is flaky due to this timeout too short, or it's the major flaky tests 
that we should workaround?
   
   In this case, increasing the timeout won't have a significant impact in the 
direction where tests in general would be slowing down. This is a very local 
change.  
   One reason to do this change is to validate an assumption. My assumption in 
this particular case is that 500 milliseconds isn't sufficient in CI perhaps 
due to some pause caused by GC etc.. Increasing the timeout from 500 millis to 
1500 millis will rule out that possibility without causing actual delays or 
harm.
   
   > The ideal way is to build a determinate happens-before order and wait 
forever, but it's more challenging to implement so I don't insist it for such a 
fix.
   
   I agree. A large part of the problem is non-optimal test design. The flaky 
test problem in Pulsar has been going on for years and it's like a wack-a-mole 
issue that when you eliminate one issue, new problems pop up elsewhere. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to