This is an automated email from the ASF dual-hosted git repository. xyz pushed a commit to branch branch-2.8 in repository https://gitbox.apache.org/repos/asf/pulsar.git
commit 2c7a19a7ec73d98b57731ca5ca905b907cd12cc7 Author: Yunze Xu <[email protected]> AuthorDate: Thu Mar 24 03:59:41 2022 +0800 [C++] Fix the race condition of connect timeout task (#14823) Fixes #14665 ### Motivation In C++ client, a connect timeout task is created each time before an asynchronous connect operation is performed, if the connection cannot be established in the configured timeout, the callback of the task will be called to close the connection and then the `createProducer` or `subscribe` methods will return `ResultConnectError`. `ClientConnection::connectTimeoutTask_`, which is a shared pointer, represents the timeout task. However, after `ClientConnection::close` is called, the shared pointer will be reset, and the underlying `PeriodicTask` object will be released. After that, when `stop` method is called on the released `PeriodicTask` object in the callback (`handleTcpConnected`), a segmentation fault will happen. The root cause is that `connectTimeoutTask_` can be accessed in two threads while one of them could release the memory. See #14665 for more explanations. This race condition leads to flaky Python tests as well, because we also have the similar test in Python tests. See https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221 So this PR might also fix #14714. ### Modifications Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After that, the `connectTimeoutTask_` will always points to the same `PeriodicTask` object, whose methods are thread safe. ### Verifying this change Execute the following command ```bash ./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10 ``` to runs the `testConnectTimeout` for 10 times. In my local env, it never failed, while before applying this patch, it's very easy to fail. (cherry picked from commit 0c3aad1e0ba0ee53784b963a1238d3d76b6dd8b2) --- pulsar-client-cpp/lib/ClientConnection.cc | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc index 3a52191..2747de0 100644 --- a/pulsar-client-cpp/lib/ClientConnection.cc +++ b/pulsar-client-cpp/lib/ClientConnection.cc @@ -1526,10 +1526,7 @@ void ClientConnection::close(Result result) { consumerStatsRequestTimer_.reset(); } - if (connectTimeoutTask_) { - connectTimeoutTask_->stop(); - connectTimeoutTask_.reset(); - } + connectTimeoutTask_->stop(); lock.unlock(); LOG_INFO(cnxString_ << "Connection closed");
