zbentley opened a new pull request, #16535:
URL: https://github.com/apache/pulsar/pull/16535

   Fixes https://github.com/apache/pulsar/issues/16527
   
   ### Motivation
   
   Copied from issue discussion: 
   
   The problem is the 
[GIL](https://wiki.python.org/moin/GlobalInterpreterLock). When Python object 
reference counts are manipulated from compiled code, those manipulations are 
not atomic or protected by the GIL in any way. Incrementing a refcount is often 
coincidentally safe to do without the GIL, since the data structures in the 
Python interpreter that are altered by a refcount-bump are few and not terribly 
shared. However, decrementing a refcount without the GIL is extremely 
dangerous; the act of decrementing a refcount can trigger object destruction, 
which can then trigger more object destruction, and so on: decrementing a 
refcount triggers an arbitrary number of user functions (destructors) to run in 
Python, and can trigger wide-ranging changes (including system calls, memory 
allocation/deallocation--basically anything) across the interpreter's internal 
state.
   
   Running such operations in true multi-threaded parallel in Python is 
basically guaranteed to break things. In most cases (I'm guessing here, as I 
don't know Boost/C++ well), I think the attempt to clean up the reference 
either blocks or fails in such a way that the C++ runtime won't properly clean 
up an object, preventing thread reaping from running internally. In some cases, 
the racy python GC operations overlap with shared interpreter data structures 
and cause segfaults. In rare cases, "impossible" (i.e. random objects changing 
type) errors are raised in Python itself, though you may have to run the above 
snippet for a long time to see one of those.
   
   Such GIL-unprotected refcount manipulation occurs in the Pulsar client 
[here](https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/src/config.cc#L106),
 
[here](https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/src/config.cc#L114),
 
[here](https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/src/config.cc#L172),
 and 
[here](https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/src/config.cc#L176),
 though the first and third may be safe from this condition by dint of the fact 
that they're only invoked directly from calling Python code which already has 
the GIL.
   
   ### Modifications
   
   - **Done** protected logger handle refcount mutations with the GIL, 
resolving a segfault risk.
   - **Done** resolve thread leak.
   - **Done** add tests.
   - **Done** don't acquire GIL for logging messages that won't be emitted due 
to log level.
   
   
   ### Verifying this change
   
   This change added tests and can be verified via the Python client test 
suite: `python pulsar_test.py`.
     
   - [x] `doc-not-needed` 
   Bugfix; external contracts do not change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to