This is an automated email from the ASF dual-hosted git repository. xyz pushed a commit to branch branch-2.9 in repository https://gitbox.apache.org/repos/asf/pulsar.git
commit 869afe4984eab7eb1b525ad4cc468592d2a8afb2 Author: Yunze Xu <[email protected]> AuthorDate: Thu Mar 24 03:59:41 2022 +0800 [C++] Fix the race condition of connect timeout task (#14823) Fixes #14665 ### Motivation In C++ client, a connect timeout task is created each time before an asynchronous connect operation is performed, if the connection cannot be established in the configured timeout, the callback of the task will be called to close the connection and then the `createProducer` or `subscribe` methods will return `ResultConnectError`. `ClientConnection::connectTimeoutTask_`, which is a shared pointer, represents the timeout task. However, after `ClientConnection::close` is called, the shared pointer will be reset, and the underlying `PeriodicTask` object will be released. After that, when `stop` method is called on the released `PeriodicTask` object in the callback (`handleTcpConnected`), a segmentation fault will happen. The root cause is that `connectTimeoutTask_` can be accessed in two threads while one of them could release the memory. See #14665 for more explanations. This race condition leads to flaky Python tests as well, because we also have the similar test in Python tests. See https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221 So this PR might also fix #14714. ### Modifications Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After that, the `connectTimeoutTask_` will always points to the same `PeriodicTask` object, whose methods are thread safe. ### Verifying this change Execute the following command ```bash ./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10 ``` to runs the `testConnectTimeout` for 10 times. In my local env, it never failed, while before applying this patch, it's very easy to fail. (cherry picked from commit 0c3aad1e0ba0ee53784b963a1238d3d76b6dd8b2) --- pulsar-client-cpp/lib/ClientConnection.cc | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc index bcee5ca..efc3cd5 100644 --- a/pulsar-client-cpp/lib/ClientConnection.cc +++ b/pulsar-client-cpp/lib/ClientConnection.cc @@ -1549,10 +1549,7 @@ void ClientConnection::close(Result result) { consumerStatsRequestTimer_.reset(); } - if (connectTimeoutTask_) { - connectTimeoutTask_->stop(); - connectTimeoutTask_.reset(); - } + connectTimeoutTask_->stop(); lock.unlock(); LOG_INFO(cnxString_ << "Connection closed");
