zbentley commented on issue #10721: URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909692933
On master, which has merged https://github.com/apache/pulsar/pull/11029, the issue still occurs. I built _pulsar.so against `master` and re-ran the above snippet twice: once against a running broker that I SIGSTOPped, and once against an already SIGSTOPped broker. Both runs exhibited the bug (both uninterruptibly hung), though the first run emitted more logs and retried internally while on `master`. It seems like the Pulsar client really wants to handle internal connect retries, but that interacts poorly with assumptions made by single-threaded calling code, like Python. Is it possible to disable that functionality entirely? I.e. let the Python Client/producer/consumer objects represent a single open socket upstream directly, rather than a lazy object that the Pulsar client may internally connect/disconnect on a socket whenever it wants? I'd happily make that trade (having to handle reconnects etc.) in exchange for knowing that client operations would reliably time out when things go wrong. Here are the logs and stacks from the one that ran during the SIGSTOP: [logs.txt](https://github.com/apache/pulsar/files/7087087/logs.txt) [stacks.txt](https://github.com/apache/pulsar/files/7087088/stacks.txt) And here are the logs and stacks from the one that I ran against an already SIGSTOPped broker: [logs.txt](https://github.com/apache/pulsar/files/7087096/logs.txt) [stacks.txt](https://github.com/apache/pulsar/files/7087097/stacks.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
