BewareMyPower opened a new pull request #11882: URL: https://github.com/apache/pulsar/pull/11882
Fixes #11847 ### Motivation There's a deadlock that might happen when Python client enables custom logging. From stack traces of #11847, we can see there're 3 threads when the Python program hanged: 1. The thread to call `ExecutorServiceProvider::close`, which waits until all worker threads of `ExecutorService` completed by `std::thread::join`. 2. The thread to use Python object for logging. It stuck at `PyGILState_Ensure`, which tried to acquire Python GIL. It's called in `ClientConnection::handleRead`. Since all pending events were cancelled by `boost::asio::io_service::stop`, these callbacks were completed with `boost::asio::error::operation_aborted` immediately. 3. The worker thread of `ExecutorService`. It waited until all callbacks including `ClientConnection::handleRead` completed. The root cause might be related to Python's GIL issues. It seems like CPython APIs might not work well in C++ destructors. It might be caused by some lifetime issues. But the direct cause is thread 1 was blocked by joining a worker thread. ### Modifications Detach the worker thread instead of join in `ExecutorService::close` to avoid potential deadlock. The close method could be called in `ClientImpl`'s destructor, which calls `shutdown`. It's better to call these blocking methods in C++ destructors even without this issue. In addition, this PR reduces the log level from error to debug for the `boost::asio::error::operation_aborted` error code, which means the registered event is cancelled by a close event of the event loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
