BewareMyPower commented on issue #14665:
URL: https://github.com/apache/pulsar/issues/14665#issuecomment-1076475922
I've added some logs and simplify the test to following code.
```c++
TEST(ClientTest, testConnectTimeout) {
// 192.0.2.0/24 is assigned for documentation, should be a deadend
const std::string blackHoleBroker = "pulsar://192.0.2.1:1234";
const std::string topic = "test-connect-timeout";
Client clientDefault(blackHoleBroker);
std::promise<Result> promiseDefault;
clientDefault.createProducerAsync(
topic, [&promiseDefault](Result result, Producer producer) {
promiseDefault.set_value(result); });
auto futureDefault = promiseDefault.get_future();
ASSERT_EQ(futureDefault.wait_for(std::chrono::milliseconds(10)),
std::future_status::timeout);
clientDefault.close();
}
```
The logs are:
```
2022-03-23 22:54:39.757 INFO [0x115853600] ClientConnection:180 | XYZ
ClientConnection, create
2022-03-23 22:54:39.759 INFO [0x115853600] ClientConnection:190 | [<none>
-> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=10000
2022-03-23 22:54:39.759 INFO [0x115853600] ConnectionPool:96 | Created
connection for pulsar://192.0.2.1:1234
2022-03-23 22:54:39.760 INFO [0x70000e591000] ClientConnection:570 | XYZ
handleResolve, start
2022-03-23 22:54:39.770 INFO [0x115853600] ClientImpl:496 | Closing Pulsar
client with 0 producers and 0 consumers
2022-03-23 22:54:39.770 INFO [0x115853600] ClientConnection:1560 | XYZ
close, stop & reset: 1
2022-03-23 22:54:39.770 WARN [0x70000e591000] ClientConnection:438 |
[<none> -> pulsar://192.0.2.1:1234] Failed to establish connection: Operation
canceled
2022-03-23 22:54:39.770 INFO [0x115853600] ClientConnection:1567 | [<none>
-> pulsar://192.0.2.1:1234] Connection closed
2022-03-23 22:54:39.770 INFO [0x70000e591000] ClientConnection:445 | XYZ
handleTcpConnected, stop
zsh: segmentation fault ./tests/main
--gtest_filter='ClientTest.testConnectTimeout'
```
It should be noted that `ClientConnection::connectTimeoutTask_` can be
accessed by multiple threads. If `ClientConnection::close` is called before the
`ClientConnection::handleTcpConnected`, the `connectTimeoutTask_` would be
reset, see
https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/lib/ClientConnection.cc#L1556
Then the reference count of the `PeriodicalTask` becomes zero, and the
memory will be released. After that, when `handleTcpConnected` accesses the
field of released `PeriodicalTask` object, `EXC_BAD_ACCESS` will happen.
> * thread #2: tid = 0x2ab6e, 0x000000010474ce7f
libpulsar.2.10.0-SNAPSHOT.dylib`bool
std::__1::__cxx_atomic_compare_exchange_strong<pulsar::PeriodicTask::State>(__a=0x0000000000000010,
__expected=0x000070000b3b21a7, __value=Closing,
__success=memory_order_seq_cst, __failure=memory_order_seq_cst) at
atomic:1039:12, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
Here is a very simple demo to explain the bug.
```c++
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>
#include <memory>
using namespace std;
class IntWrapper {
public:
int increaseAndGet() { return x_++; }
private:
std::atomic_int x_{0};
};
class Demo {
public:
void close() { i_.reset(); }
void run() { std::cout << i_->increaseAndGet() << std::endl; }
private:
std::shared_ptr<IntWrapper> i_{new IntWrapper};
};
int main(int argc, char* argv[]) {
Demo demo;
std::thread t{[&demo] { demo.close(); }}; // the internal shared
pointer was reset before `run()` is called
t.detach();
std::this_thread::sleep_for(std::chrono::seconds(1));
demo.run();
return 0;
}
```
Use `lldb` to debug:
```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
(code=1, address=0x0)
frame #0: 0x0000000100002338 a.out`int
std::__1::__cxx_atomic_fetch_add<int>(std::__1::__cxx_atomic_base_impl<int>*,
int, std::__1::memory_order) + 152
a.out`std::__1::__cxx_atomic_fetch_add<int>:
-> 0x100002338 <+152>: lock
0x100002339 <+153>: xaddl %eax, (%rcx)
0x10000233c <+156>: movl %eax, -0x18(%rbp)
0x10000233f <+159>: movl -0x18(%rbp), %eax
Target 0: (a.out) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
(code=1, address=0x0)
* frame #0: 0x0000000100002338 a.out`int
std::__1::__cxx_atomic_fetch_add<int>(std::__1::__cxx_atomic_base_impl<int>*,
int, std::__1::memory_order) + 152
frame #1: 0x0000000100002291 a.out`std::__1::__atomic_base<int,
true>::fetch_add(int, std::__1::memory_order) + 33
frame #2: 0x0000000100002262 a.out`std::__1::__atomic_base<int,
true>::operator++(int) + 34
frame #3: 0x00000001000021c7 a.out`IntWrapper::increaseAndGet() + 23
frame #4: 0x000000010000134d a.out`Demo::run() + 29
frame #5: 0x000000010000117f a.out`main + 127
frame #6: 0x00000001000194fe dyld`start + 462
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]