zbentley opened a new issue, #116: URL: https://github.com/apache/pulsar-client-python/issues/116
We've observed full python interpreter lockups (not just "blocking": the interpreter calling the client halts; can't be unblocked or time out/raise exceptions, even if the blocking operation is moved to a python Thread and waited on with a timeout) in the presence of: - The 2.10.1 python client. - Python threading (using pulsar Client from a thread). - Python asyncio/event loop Future manipulation. - consumers in the act of receiving messages (running client's internal receive loop). - Many Nacks of the same message. - Multiple consumers. - using a Python `logger=` argument to Client. We must do this, otherwise the logs emitted by the client to STDOUT fill up our disks. All of those have to be present to trigger the issue. When multiple Shared consumers are repeatedly nacking messages with a 15sec delay on a topic with a few hundred messages (100% of them are nacked over and over), all but one of the consumers eventually (within a few minutes) locks up--that is, no Python in that consumer can run. It's not just that it's blocked in a `negative_acknowledge` call, it's that all threads, signal handlers, coroutines, etc. in that interpreter are stuck. This says GIL conflict to me. While this program has many hundreds of threads, the stacktraces from the most relevant ones are included here: [threads.txt](https://github.com/apache/pulsar-client-python/files/11406968/threads.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
