zbentley commented on issue #127:
URL: 
https://github.com/apache/pulsar-client-python/issues/127#issuecomment-1572093495

   Thanks @BewareMyPower. Here are the logs from a run (MacOS 11, Python 
3.10.9, client 3.1.0) that got stuck with 4 processes:
   ```
   Joining pool
   Joined pool
   2023-06-01 09:40:21.098 INFO  [0x104424580] ProducerImpl:697 | Producer - 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 standalone-36-796] , [batching  = off]
   2023-06-01 09:40:21.098 INFO  [0x104424580] ClientConnection:1600 | 
[[::1]:53830 -> [::1]:6650] Connection closed with ConnectError
   2023-06-01 09:40:21.099 INFO  [0x104424580] ClientConnection:269 | 
[[::1]:53830 -> [::1]:6650] Destroyed connection
   2023-06-01 09:40:21.357 INFO  [0x104514580] ClientConnection:190 | [<none> 
-> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2023-06-01 09:40:21.357 INFO  [0x104514580] ConnectionPool:97 | Created 
connection for pulsar://localhost:6650
   2023-06-01 09:40:21.359 INFO  [0x16baf3000] ClientConnection:388 | 
[[::1]:53831 -> [::1]:6650] Connected to broker
   2023-06-01 09:40:21.375 INFO  [0x16baf3000] HandlerBase:72 | 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 ] Getting connection from pool
   2023-06-01 09:40:21.386 INFO  [0x16baf3000] ProducerImpl:202 | 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 ] Created producer on broker [[::1]:53831 -> [::1]:6650]
   Destroying connections
   2023-06-01 09:40:21.402 INFO  [0x104514580] ProducerImpl:697 | Producer - 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 standalone-36-797] , [batching  = off]
   2023-06-01 09:40:21.402 INFO  [0x104514580] ClientConnection:1600 | 
[[::1]:53831 -> [::1]:6650] Connection closed with ConnectError
   Destroying connections
   2023-06-01 09:40:21.403 INFO  [0x104514580] ProducerImpl:697 | Producer - 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 standalone-36-797] , [batching  = off]
   Destroying connections
   2023-06-01 09:40:21.403 INFO  [0x104514580] ClientConnection:1600 | 
[[::1]:53831 -> [::1]:6650] Connection closed with ConnectError
   2023-06-01 09:40:21.403 INFO  [0x104514580] ProducerImpl:697 | Producer - 
[persistent://chariot1/chariot_ns_sre--kms_test/chariot_topic_kms_test-partition-1,
 standalone-36-797] , [batching  = off]
   2023-06-01 09:40:21.403 INFO  [0x104514580] ClientConnection:1600 | 
[[::1]:53831 -> [::1]:6650] Connection closed with ConnectError
   Destroying connections
   Destroyed connections
   Destroyed connections
   Destroyed connections
   Joining pool
   ```
   
   An `lldb` backtrace is attached to this comment. It looks slightly different 
than the `py-spy` backtrace I provided from a Linux host in production, but 
shows similar defective behavior. 
   
   The problem largely appears to be that *fork-safe programs that use threads 
must assume those threads may vanish without informing the rest of the program* 
(that's what pthread_atfork(3) is for). When a threaded program 
forks-without-execcing, only the thread calling fork(2) exists in the child. 
All of the other threads vanish, in the midst of whatever they were doing.
   
   To be fair, this is [documentedly unsafe behavior according to 
POSIX](http://www.doublersolutions.com/docs/dce/osfdocs/htmls/develop/appdev/Appde193.htm),
 but it's also the everyday reality for the most common Python application 
harnesses in the world. Most Python isn't single-threaded, nor is it 
single-process. As a result, drivers loaded by Python programs must assume 
those programs may fork, at which point threads will vanish.
   
   While this is water far under the bridge at this point for `pulsar-client`, 
those realities are one of the reasons why multithreaded drivers are often a 
problematic design. Since drivers have to work in "hostile environments" 
(embedded interpreters, forking code, thread-constrained, under-resourced, 
driver code invoked from signal handlers or atfork hooks, etc). Using 
multithreading inside a client library might be safe in languages that tend to 
work in a more uniform "the runtime is the entry point" way, like Go and Java, 
but in languages like Python that are often run in weird ways and/or messed up 
environments, it can cause problems. The more robust drivers I've used eschew 
multithreading internally, even at the cost of more complex usage APIs for end 
users (e.g. user code must assume the responsibility for "turning" the driver 
event loop and/or performing heartbeat pings). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to