BewareMyPower opened a new pull request #11882:
URL: https://github.com/apache/pulsar/pull/11882


   Fixes #11847 
   
   ### Motivation
   
   There's a deadlock that might happen when Python client enables custom 
logging. From stack traces of #11847, we can see there're 3 threads when the 
Python program hanged:
   1. The thread to call `ExecutorServiceProvider::close`, which waits until 
all worker threads of `ExecutorService` completed by `std::thread::join`.
   2. The thread to use Python object for logging. It stuck at 
`PyGILState_Ensure`, which tried to acquire Python GIL. It's called in 
`ClientConnection::handleRead`. Since all pending events were cancelled by 
`boost::asio::io_service::stop`, these callbacks were completed with 
`boost::asio::error::operation_aborted` immediately.
   3. The worker thread of `ExecutorService`. It waited until all callbacks 
including `ClientConnection::handleRead` completed.
   
   The root cause might be related to Python's GIL issues. It seems like 
CPython APIs might not work well in C++ destructors. It might be caused by some 
lifetime issues. But the direct cause is thread 1 was blocked by joining a 
worker thread.
   
   ### Modifications
   
   Detach the worker thread instead of join in `ExecutorService::close` to 
avoid potential deadlock. The close method could be called in `ClientImpl`'s 
destructor, which calls `shutdown`. It's better to call these blocking methods 
in C++ destructors even without this issue.
   
   In addition, this PR reduces the log level from error to debug for the 
`boost::asio::error::operation_aborted` error code, which means the registered 
event is cancelled by a close event of the event loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to