codelipenghui opened a new pull request #14231: URL: https://github.com/apache/pulsar/pull/14231
### Background The issue is a race condition introduced by this PR https://github.com/apache/pulsar/pull/11884 which introduced a struct to maintain pending messages, but it has a race condition for `foreach()` and `peek` , the issue is during the `foreach` process, if the new messages sent to the broker and received the receipt from the broker, then the producer will peek a null message from the `OpSendMsgQueue` , I have added some logs to confirm the issue. Here are the logs which can explain the bug: ``` 2022-02-11T12:40:00,571+0800 [pulsar-timer-5-1] ERROR org.apache.pulsar.client.impl.ProducerImpl - For each OpSendMsgQueue, 0 2022-02-11T12:40:00,572+0800 [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - [public/default/t_topic] [standalone-0-3] Pending messages: 1 --- Publish throughput: 0.99 msg/s --- 0.00 Mbit/s --- Latency: med: 0.000 ms - 95pct: 0.000 ms - 99pct: 0.000 ms - 99.9pct: 0.000 ms - max: -∞ ms --- Ack received rate: 0.00 ack/s --- Failed messages: 0 2022-02-11T12:40:01,564+0800 [pulsar-client-io-1-1] ERROR org.apache.pulsar.client.impl.ProducerImpl - Add message to OpSendMsgQueue.postponedOpSendMgs, 1, 182801 2022-02-11T12:40:01,566+0800 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [public/default/s_topic] [standalone-0-6] Got ack for timed out msg 182801 - 182900 2022-02-11T12:40:01,573+0800 [pulsar-timer-5-1] ERROR org.apache.pulsar.client.impl.ProducerImpl - For each OpSendMsgQueue, 0 2022-02-11T12:40:01,573+0800 [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ProducerImpl - Put the opsend back to deque of sequenceID 182801 ``` From the logs, you can see a message with sequence ID 182801 add the `OpSendMsgQueue` first, but after the producer received the receipt, the log shows Got ack for timed out msg which means got null when peeking messages, and after, the message add back to the internal queue, but the producer side is blocked at this time. ### Modification 1. Avoid using foreach to iterate the pending ops, use iterator to instead 2. Keep using OpSendMsgQueue to avoid expose `foreach` method ### Documentation Check the box below or label this PR directly (if you have committer privilege). Need to update docs? - [ ] `doc-required` (If you need help on updating docs, create a doc issue) - [x] `no-need-doc` (Please explain why) - [ ] `doc` (If this PR contains doc changes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
