michaeljmarshall commented on issue #9916: URL: https://github.com/apache/pulsar/issues/9916#issuecomment-802583964
I made some solid progress running through this test with a debugger. It does indeed look like there is a race condition based on the design of the test and the way that we send messages to dead letter queues. Essentially, we send a message to a DLQ before we acknowledge it on the original topic. Combine that with the fact that when a consumer client is closed, it releases all unacknowledged messages to any other consumers, and you get the flaky test that results from a race condition. At this point, I have clearly identified why we have a flaky test, but I haven't determined the right solution (change the test or change the source). I will spend more time thinking about this and either post with follow up questions and/or options or submit a PR. Fundamentally, I don't think this test should be too hard to fix. I need to sign off for the night, but I am very interested in solving this one. I should be able to get more time with it tomorrow or this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
