zbentley opened a new issue, #16009:
URL: https://github.com/apache/pulsar/issues/16009

   **Describe the bug**
   Occasionally, in flakey CI tests that run Python code in a heavily threaded 
environment, we have segfaults when calling `connect`, `create_producer`, or 
`producer.close()` on Pulsar client objects.
   
   I wish I had more debugging info or a full C system call trace, but the 
failures occur only in CI where I can't use `gdb`, and Python's `faulthandler` 
doesn't seem to provide a stacktrace unfortunately.
   
   The errors always have these characteristics:
   - They exist in an environment with many threads.
   - Some of the threads have previously used a Pulsar client, and still exist, 
but are not using that client.
   - The current thread attempting to use the Pulsar client is doing a 
`connect`, `create_producer`, or `producer.close()` operation.
   - The error line is either an unhandled SIGSEGV, or an abort with the 
description `[__pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' 
failed](https://stackoverflow.com/questions/9239999/pthread-mutex-lock-c62-pthread-mutex-lock-assertion-mutex-data-owner)`.
   
   I'm sorry I don't have more specific debugging or reproduction information.
   
   This only occurs on client 2.10.0; we're using that client with Python 
3.7.13 on Linux, in Docker, x86_64 arch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to