BewareMyPower opened a new issue #6822: URL: https://github.com/apache/pulsar/issues/6822
**Describe the bug** With C++ client, if a `Producer` call `close()` while there're some pending expired messages, after that, a segmentation fault would happen. **To Reproduce** Steps to reproduce the behavior by running [SendAsyncThenClose.cpp](https://gist.github.com/BewareMyPower/2c4760cec0c6f19985700f2b10f56334): 1. Create a producer with 1ms send timeout; 2. Then call `sendAsync` to send a large amount of messages, eg. 10000; 3. Close the producer. 4. Many send callbacks are called after `close()`, then a segmentation fault may occur. **Expected behavior** After producer calls `close()`, pending send callbacks should be discarded and not called. **Additional context** It may exit normally, so more runs are needed to reproduce the error, eg. ``` 2020-04-26 13:21:15.221 INFO ProducerImpl:493 | [persistent://public/default/FooTest, standalone-0-51] Closing producer for topic persistent://public/default/FooTest [OK] msg 8951: (3514,7,-1,951) # .... [OK] msg 8999: (3514,7,-1,999) [FAILED] msg 9000 # ... [FAILED] msg 9999 Segmentation fault ``` The backtrace (I deleted the prefix of path): ``` #0 0x00007f5ad24dd53f in boost::date_time::microsec_clock<boost::posix_time::ptime>::create_time (converter=<optimized out>) at boost/date_time/microsec_time_clock.hpp:86 #1 boost::date_time::microsec_clock<boost::posix_time::ptime>::universal_time () at boost/date_time/microsec_time_clock.hpp:78 #2 boost::asio::time_traits<boost::posix_time::ptime>::now () at boost/asio/time_traits.hpp:48 #3 boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::expires_from_now (ec=..., expiry_time=..., impl=..., this=<optimized out>) at boost/asio/detail/deadline_timer_service.hpp:213 #4 boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::executor>::expires_from_now ( this=0x0, expiry_time=...) at boost/asio/basic_deadline_timer.hpp:548 #5 0x00007f5ad257a736 in pulsar::ProducerImpl::handleSendTimeout (this=0xbc001250, err=...) at pulsar/pulsar-client-cpp/lib/ProducerImpl.cc:578 ``` Frame 5: ```c++ 573 if (diff.total_milliseconds() <= 0) { 574 // The diff is less than or equal to zero, meaning that the message has been expired. 575 LOG_DEBUG(getName() << "Timer expired. Calling timeout callbacks."); 576 failPendingMessages(ResultTimeout); 577 // Since the pending queue is cleared now, set timer to expire after configured value. 578 sendTimer_->expires_from_now(milliseconds(conf_.getSendTimeout())); 579 } else { ``` ``` (gdb) p sendTimer_ $1 = {<std::__shared_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::executor>, (__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>} ``` Because in `ProducerImpl::closeAsync()`, `cancelTimers` is called so that `sendTimer_` was cancelled and reset to nullptr. But `ProducerImpl::handleSendTimeout` queued in the event loop of `Executor` may still be called, while it doesn't check if `sendTimer_` has been cancelled. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
