zbentley commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909692933


   On master, which has merged https://github.com/apache/pulsar/pull/11029, the 
issue still occurs. I built _pulsar.so against `master` and re-ran the above 
snippet twice: once against a running broker that I SIGSTOPped, and once 
against an already SIGSTOPped broker.
   
   Both runs exhibited the bug (both uninterruptibly hung), though the first 
run emitted more logs and retried internally while on `master`.
   
   It seems like the Pulsar client really wants to handle internal connect 
retries, but that interacts poorly with assumptions made by single-threaded 
calling code, like Python.
   
   Is it possible to disable that functionality entirely? I.e. let the Python 
Client/producer/consumer objects represent a single open socket upstream 
directly, rather than a lazy object that the Pulsar client may internally 
connect/disconnect on a socket whenever it wants? I'd happily make that trade 
(having to handle reconnects etc.) in exchange for knowing that client 
operations would reliably time out when things go wrong.
   
   Here are the logs and stacks from the one that ran during the SIGSTOP:
   [logs.txt](https://github.com/apache/pulsar/files/7087087/logs.txt)
   [stacks.txt](https://github.com/apache/pulsar/files/7087088/stacks.txt)
   
   And here are the logs and stacks from the one that I ran against an already 
SIGSTOPped broker:
   [logs.txt](https://github.com/apache/pulsar/files/7087096/logs.txt)
   [stacks.txt](https://github.com/apache/pulsar/files/7087097/stacks.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to