devinbost commented on issue #6198: Flaky-test: PulsarStateTest.testSinkState URL: https://github.com/apache/pulsar/issues/6198#issuecomment-581671178 This issue seems to not occur after increasing the timeouts (e.g. doubling the retryCount and initSleepTimeInMillis for retryStrategically) when running them under stress. I pushed this change to my fork to see if it resolves the issue when running from Github CI. The problem with the current approach to these tests is that there is a race condition between checking the status from the Admin API (which executes a REST call) and the method responsible for producing the messages. `producer.send` blocks on the send operation, but it doesn't block on the receive operation. Ideally, we'd have a way to block (at least for a period of time) until the messages are all received instead of needing to poll on the status. However, such a change may not necessarily fix the test because we'd still be depending on execution to complete successfully after a period of time. (We may not have a choice because the lack of a timeout could cause the test to run indefinitely.) So, that brings us back to the idea of increasing the timeouts when polling the status to ensure we receive all the messages when using a slow test runner. @sijie @jiazhai @yjshen Thoughts?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services