poorbarcode opened a new pull request, #20695: URL: https://github.com/apache/pulsar/pull/20695
reopen https://github.com/apache/pulsar/pull/20591 ### Motivation **Background of consumer reconnects** - grab connection: - lookup the broker which owned the topic - get the existing connection by the broker; create one if it does not exist - clear the messages in memory - send `CMD-subscribe` to the broker - send `flow permits` to broker to increment `availablePermits` --- **Issue** If the consumer tries to reconnect<sup>[1]</sup> multi times, it will lose some messages due to a race condition, for example: | time | reconnect-1 | reconnect-2 | | --- | --- | --- | | 1 | Grab connection | | 2 | | Grab connection | | 3 | Clear messages in memory | | 4 | Do subscribe and increase `availablePermits` | | 5 | receive 712 messages | | 6 | | Clear messages in memory. <strong>(Highlight)</strong> 712 messages were lost | | 7 | Do subscribe and increase `availablePermits`(Since a subscription is there, so broker just response success) | We can see that the variable `availablePermits` of all the stuck consumers is large than `1000`(by default, the max value is `1000`), and `${availablePermits} + ${msgOutCounter}` is an integer multiple of `1000`. It means the broker received more than one `CMD-flow` of the same consumer<sup>[2]</sup>. --- ### Footnotes **[2]**: Topic stats ``` { "msgOutCounter" : 712, "availablePermits" : 1288, // 712 + 1288 = 2000 "unackedMessages" : 707, "consumerName" : "fen-prod-gke-usw4-a-ping-779cfdccd4-np479", "address" : "/127.0.0.6:39213", "connectedSince" : "2023-06-12T14:17:35.565971Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 678, "availablePermits" : 2322, // 678 + 2322 = 3000 "unackedMessages" : 676, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-w7zgn", "address" : "/127.0.0.6:48109", "connectedSince" : "2023-06-12T14:17:38.72818Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 723, "availablePermits" : 2277, // 723 + 2277 = 3000 "unackedMessages" : 721, "consumerName" : "fen-prod-gke-usw4-a-ping-779cfdccd4-nbgzf", "address" : "/127.0.0.6:51253", "connectedSince" : "2023-06-12T14:17:39.182882Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 717, "availablePermits" : 1283, // 717 + 1283 = 2000 "unackedMessages" : 716, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-48h74", "address" : "/127.0.0.6:52277", "connectedSince" : "2023-06-12T14:17:41.544532Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 707, "availablePermits" : 2293, // 707 + 2293 = 3000 "unackedMessages" : 706, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-5x827", "address" : "/127.0.0.6:52153", "connectedSince" : "2023-06-12T14:17:43.024048Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 711, "availablePermits" : 2289, // 711 + 2289 = 3000 "unackedMessages" : 705, "consumerName" : "fen-prod-gke-usw4-a-ping-779cfdccd4-rmpjf", "address" : "/127.0.0.6:40349", "connectedSince" : "2023-06-12T14:17:43.499337Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 724, "availablePermits" : 1276, // 724 + 1276 = 3000 "unackedMessages" : 721, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-gbqsr", "address" : "/127.0.0.6:46283", "connectedSince" : "2023-06-12T14:17:44.110605Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 699, "availablePermits" : 1301, "unackedMessages" : 698, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-jmjrs", "address" : "/127.0.0.6:53431", "connectedSince" : "2023-06-12T14:17:45.891044Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 720, "availablePermits" : 1280, "unackedMessages" : 720, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-shxvs", "address" : "/127.0.0.6:34651", "connectedSince" : "2023-06-12T14:17:47.672162Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 735, "availablePermits" : 1265, "unackedMessages" : 735, "consumerName" : "fen-prod-gke-usw4-a-ping-779cfdccd4-m6jxb", "address" : "/127.0.0.6:52331", "connectedSince" : "2023-06-12T14:17:49.946036Z", "clientVersion" : "3.1.1" },{ "msgOutCounter" : 733, "availablePermits" : 4267, "unackedMessages" : 728, "consumerName" : "fen-prod-gke-usw4-b-ping-888c9c854-74dns", "address" : "/127.0.0.6:38375", "connectedSince" : "2023-06-12T14:17:54.845697Z", "clientVersion" : "3.1.1" } ``` ### Modifications Discard the task `subscribe` if there is an in-flight subscribe ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
