codelipenghui opened a new pull request #14231:
URL: https://github.com/apache/pulsar/pull/14231


   ### Background
   
   The issue is a race condition introduced by this PR 
https://github.com/apache/pulsar/pull/11884 which introduced a struct to 
maintain pending messages, but it has a race condition for `foreach()`  and 
`peek` , the issue is during the `foreach` process, if the new messages sent to 
the broker and received the receipt from the broker, then the producer will 
peek a null message from the `OpSendMsgQueue` , I have added some logs to 
confirm the issue.
   
   Here are the logs which can explain the bug:
   
   ```
   2022-02-11T12:40:00,571+0800 [pulsar-timer-5-1] ERROR 
org.apache.pulsar.client.impl.ProducerImpl - For each OpSendMsgQueue, 0
   2022-02-11T12:40:00,572+0800 [pulsar-timer-5-1] INFO  
org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - 
[public/default/t_topic] [standalone-0-3] Pending messages: 1 --- Publish 
throughput: 0.99 msg/s --- 0.00 Mbit/s --- Latency: med: 0.000 ms - 95pct: 
0.000 ms - 99pct: 0.000 ms - 99.9pct: 0.000 ms - max: -∞ ms --- Ack received 
rate: 0.00 ack/s --- Failed messages: 0
   2022-02-11T12:40:01,564+0800 [pulsar-client-io-1-1] ERROR 
org.apache.pulsar.client.impl.ProducerImpl - Add message to 
OpSendMsgQueue.postponedOpSendMgs, 1, 182801
   2022-02-11T12:40:01,566+0800 [pulsar-client-io-1-1] INFO  
org.apache.pulsar.client.impl.ProducerImpl - [public/default/s_topic] 
[standalone-0-6] Got ack for timed out msg 182801 - 182900
   2022-02-11T12:40:01,573+0800 [pulsar-timer-5-1] ERROR 
org.apache.pulsar.client.impl.ProducerImpl - For each OpSendMsgQueue, 0
   2022-02-11T12:40:01,573+0800 [pulsar-timer-5-1] INFO  
org.apache.pulsar.client.impl.ProducerImpl - Put the opsend back to deque of 
sequenceID 182801
   ```
   
   From the logs, you can see a message with sequence ID 182801 add the 
`OpSendMsgQueue` first, but after the producer received the receipt, the log 
shows Got ack for timed out msg which means got null when peeking messages, and 
after, the message add back to the internal queue, but the producer side is 
blocked at this time.
   
   ### Modification
   
   1. Avoid using foreach to iterate the pending ops, use iterator to instead
   2. Keep using OpSendMsgQueue to avoid expose `foreach` method
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer 
privilege).
   
   Need to update docs? 
   
   - [ ] `doc-required` 
     
     (If you need help on updating docs, create a doc issue)
     
   - [x] `no-need-doc` 
     
     (Please explain why)
     
   - [ ] `doc` 
     
     (If this PR contains doc changes)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to