devinbost commented on issue #6054: URL: https://github.com/apache/pulsar/issues/6054#issuecomment-832454134
It turns out that the client producer isn't getting ack's from the broker when the subscription has frozen. The ack's just mysteriously stop. I walked through the code flow all the way from when the producer starts the process of sending a message to when the ack is sent back to the client. Simplified, that flow looks like this: Client builds `newSend` command and drops into executor -> `PulsarDecoder.handleSend(..)` -> `ServerCnx.handleSend(..)` -> `Producer.publishMessage(..)` -> `PersistentTopic.publishMessage(..)`, which writes to the ledger and triggers a callback on the `Producer.MessagePublishContext` instance, which triggers `MessagePublishContext.run()` -> `Producer.ServerCnx.getCommandSender().sendSendReceiptResponse(..)`, which writes a new `CommandSendReceipt` to the Netty channel. Then, that gets picked up by `PulsarDecoder.handleSendReceipt(..)` -> `ClientCnx.handleSendReceipt(..)`, which writes the log message: > Got receipt for producer . . . But, that log line is never reached when the subscription has frozen. So, the question is: Where in that flow did it stop? My plan is to add a bunch of debug statements to each method in that flow in a custom build to try to pinpoint where the flow is stopping. It very much seems to be another concurrency issue taking place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
