BewareMyPower commented on issue #14665:
URL: https://github.com/apache/pulsar/issues/14665#issuecomment-1076475922


   I've added some logs and simplify the test to following code.
   
   ```c++
   TEST(ClientTest, testConnectTimeout) {
       // 192.0.2.0/24 is assigned for documentation, should be a deadend
       const std::string blackHoleBroker = "pulsar://192.0.2.1:1234";
       const std::string topic = "test-connect-timeout";
   
       Client clientDefault(blackHoleBroker);
   
       std::promise<Result> promiseDefault;
       clientDefault.createProducerAsync(
           topic, [&promiseDefault](Result result, Producer producer) { 
promiseDefault.set_value(result); });
   
       auto futureDefault = promiseDefault.get_future();
       ASSERT_EQ(futureDefault.wait_for(std::chrono::milliseconds(10)), 
std::future_status::timeout);
   
       clientDefault.close();
   }
   ```
   
   The logs are:
   
   ```
   2022-03-23 22:54:39.757 INFO  [0x115853600] ClientConnection:180 | XYZ 
ClientConnection, create
   2022-03-23 22:54:39.759 INFO  [0x115853600] ClientConnection:190 | [<none> 
-> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=10000
   2022-03-23 22:54:39.759 INFO  [0x115853600] ConnectionPool:96 | Created 
connection for pulsar://192.0.2.1:1234
   2022-03-23 22:54:39.760 INFO  [0x70000e591000] ClientConnection:570 | XYZ 
handleResolve, start
   2022-03-23 22:54:39.770 INFO  [0x115853600] ClientImpl:496 | Closing Pulsar 
client with 0 producers and 0 consumers
   2022-03-23 22:54:39.770 INFO  [0x115853600] ClientConnection:1560 | XYZ 
close, stop & reset: 1
   2022-03-23 22:54:39.770 WARN  [0x70000e591000] ClientConnection:438 | 
[<none> -> pulsar://192.0.2.1:1234] Failed to establish connection: Operation 
canceled
   2022-03-23 22:54:39.770 INFO  [0x115853600] ClientConnection:1567 | [<none> 
-> pulsar://192.0.2.1:1234] Connection closed
   2022-03-23 22:54:39.770 INFO  [0x70000e591000] ClientConnection:445 | XYZ 
handleTcpConnected, stop
   zsh: segmentation fault  ./tests/main 
--gtest_filter='ClientTest.testConnectTimeout'
   ```
   
   It should be noted that `ClientConnection::connectTimeoutTask_` can be 
accessed by multiple threads. If `ClientConnection::close` is called before the 
`ClientConnection::handleTcpConnected`, the `connectTimeoutTask_` would be 
reset, see 
https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/lib/ClientConnection.cc#L1556
   
   Then the reference count of the `PeriodicalTask` becomes zero, and the 
memory will be released. After that, when `handleTcpConnected` accesses the 
field of released `PeriodicalTask` object, `EXC_BAD_ACCESS` will happen.
   
   > * thread #2: tid = 0x2ab6e, 0x000000010474ce7f 
libpulsar.2.10.0-SNAPSHOT.dylib`bool 
std::__1::__cxx_atomic_compare_exchange_strong<pulsar::PeriodicTask::State>(__a=0x0000000000000010,
 __expected=0x000070000b3b21a7, __value=Closing, 
__success=memory_order_seq_cst, __failure=memory_order_seq_cst) at 
atomic:1039:12, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
   
   Here is a very simple demo to explain the bug.
   
   ```c++
   #include <iostream>
   #include <chrono>
   #include <thread>
   #include <atomic>
   #include <memory>
   using namespace std;
   
   class IntWrapper {
      public:
       int increaseAndGet() { return x_++; }
   
      private:
       std::atomic_int x_{0};
   };
   
   class Demo {
      public:
       void close() { i_.reset(); }
   
       void run() { std::cout << i_->increaseAndGet() << std::endl; }
   
      private:
       std::shared_ptr<IntWrapper> i_{new IntWrapper};
   };
   
   int main(int argc, char* argv[]) {
       Demo demo;
       std::thread t{[&demo] { demo.close(); }};  // the internal shared 
pointer was reset before `run()` is called
       t.detach();
       std::this_thread::sleep_for(std::chrono::seconds(1));
       demo.run();
       return 0;
   }
   ```
   
   Use `lldb` to debug:
   
   ```
   * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
       frame #0: 0x0000000100002338 a.out`int 
std::__1::__cxx_atomic_fetch_add<int>(std::__1::__cxx_atomic_base_impl<int>*, 
int, std::__1::memory_order) + 152
   a.out`std::__1::__cxx_atomic_fetch_add<int>:
   ->  0x100002338 <+152>: lock   
       0x100002339 <+153>: xaddl  %eax, (%rcx)
       0x10000233c <+156>: movl   %eax, -0x18(%rbp)
       0x10000233f <+159>: movl   -0x18(%rbp), %eax
   Target 0: (a.out) stopped.
   (lldb) bt 
   * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=1, address=0x0)
     * frame #0: 0x0000000100002338 a.out`int 
std::__1::__cxx_atomic_fetch_add<int>(std::__1::__cxx_atomic_base_impl<int>*, 
int, std::__1::memory_order) + 152
       frame #1: 0x0000000100002291 a.out`std::__1::__atomic_base<int, 
true>::fetch_add(int, std::__1::memory_order) + 33
       frame #2: 0x0000000100002262 a.out`std::__1::__atomic_base<int, 
true>::operator++(int) + 34
       frame #3: 0x00000001000021c7 a.out`IntWrapper::increaseAndGet() + 23
       frame #4: 0x000000010000134d a.out`Demo::run() + 29
       frame #5: 0x000000010000117f a.out`main + 127
       frame #6: 0x00000001000194fe dyld`start + 462
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to