BewareMyPower opened a new issue #6822:
URL: https://github.com/apache/pulsar/issues/6822


   **Describe the bug**
   With C++ client, if a `Producer` call `close()` while there're some pending 
expired messages, after that, a segmentation fault would happen.
   
   **To Reproduce**
   Steps to reproduce the behavior by running 
[SendAsyncThenClose.cpp](https://gist.github.com/BewareMyPower/2c4760cec0c6f19985700f2b10f56334):
   1. Create a producer with 1ms send timeout;
   2. Then call `sendAsync` to send a large amount of messages, eg. 10000;
   3. Close the producer.
   4. Many send callbacks are called after `close()`, then a segmentation fault 
may occur.
   
   **Expected behavior**
   After producer calls `close()`, pending send callbacks should be discarded 
and not called.
   
   **Additional context**
   It may exit normally, so more runs are needed to reproduce the error, eg.
   
   ```
   2020-04-26 13:21:15.221 INFO  ProducerImpl:493 | 
[persistent://public/default/FooTest, standalone-0-51] Closing producer for 
topic persistent://public/default/FooTest
   [OK] msg 8951: (3514,7,-1,951)
   # ....
   [OK] msg 8999: (3514,7,-1,999)
   [FAILED] msg 9000
   # ...
   [FAILED] msg 9999
   Segmentation fault
   ```
   
   The backtrace (I deleted the prefix of path):
   ```
   #0  0x00007f5ad24dd53f in 
boost::date_time::microsec_clock<boost::posix_time::ptime>::create_time 
(converter=<optimized out>)
       at boost/date_time/microsec_time_clock.hpp:86
   #1  
boost::date_time::microsec_clock<boost::posix_time::ptime>::universal_time ()
       at boost/date_time/microsec_time_clock.hpp:78
   #2  boost::asio::time_traits<boost::posix_time::ptime>::now () at 
boost/asio/time_traits.hpp:48
   #3  
boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime>
 >::expires_from_now (ec=..., expiry_time=..., impl=..., 
       this=<optimized out>) at boost/asio/detail/deadline_timer_service.hpp:213
   #4  boost::asio::basic_deadline_timer<boost::posix_time::ptime, 
boost::asio::time_traits<boost::posix_time::ptime>, 
boost::asio::executor>::expires_from_now (
       this=0x0, expiry_time=...) at boost/asio/basic_deadline_timer.hpp:548
   #5  0x00007f5ad257a736 in pulsar::ProducerImpl::handleSendTimeout 
(this=0xbc001250, err=...)
       at pulsar/pulsar-client-cpp/lib/ProducerImpl.cc:578
   ```
   
   Frame 5:
   ```c++
   573          if (diff.total_milliseconds() <= 0) {
   574              // The diff is less than or equal to zero, meaning that the 
message has been expired.
   575              LOG_DEBUG(getName() << "Timer expired. Calling timeout 
callbacks.");
   576              failPendingMessages(ResultTimeout);
   577              // Since the pending queue is cleared now, set timer to 
expire after configured value.
   578              
sendTimer_->expires_from_now(milliseconds(conf_.getSendTimeout()));
   579          } else {
   ```
   
   ```
   (gdb) p sendTimer_ 
   $1 = 
{<std::__shared_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, 
boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::executor>, 
(__gnu_cxx::_Lock_policy)2>> = {_M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No 
data fields>}
   ```
   
   Because in `ProducerImpl::closeAsync()`,  `cancelTimers` is called so that 
`sendTimer_` was cancelled and reset to nullptr. But 
`ProducerImpl::handleSendTimeout` queued in the event loop of `Executor` may 
still be called, while it doesn't check if `sendTimer_` has been cancelled.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to