BewareMyPower opened a new pull request, #17645: URL: https://github.com/apache/pulsar/pull/17645
Fixes #14848 ### Motivation There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky. The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88 To verify this point, I added some logs and saw: ``` 2022-09-14 03:52:28.427 INFO [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110 # ... 2022-09-14 03:52:28.444 INFO [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientTest:217 | Reference count of the reader: 4 # ... 2022-09-14 03:52:28.445 INFO [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers 2022-09-14 03:52:28.445 INFO [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3) ``` The first `Reader` object 0x7fffd2a7c110 was constructed in main thread 140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread 140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed. Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment. ### Modifications Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope. Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed. ### Verifying this change Following the steps here to reproduce: https://github.com/apache/pulsar/issues/14848#issuecomment-1246375370 The test never failed even with `--gtest_repeat=100`. ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc-required` (Your PR needs to update docs and you will update later) - [x] `doc-not-needed` (Please explain why) - [ ] `doc` (Your PR contains doc changes) - [ ] `doc-complete` (Docs have been already added) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
