GenerousMan opened a new issue, #6662: URL: https://github.com/apache/rocketmq/issues/6662
### Before Creating the Bug Report - [x] I found a bug, not just asking a question, which should be created in [GitHub Discussions](https://github.com/apache/rocketmq/discussions). - [x] I have searched the [GitHub Issues](https://github.com/apache/rocketmq/issues) and [GitHub Discussions](https://github.com/apache/rocketmq/discussions) of this repository and believe that this is not a duplicate. - [x] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ. ### Describe the Bug In controller mode, dual masters may occur when the network is partitioned. When the abnormal master (not recognized by the controller) is isolated by the network with a slave, its confirmOffset may change, causing the consumption point to move forward. This may result in message loss. 在controller模式下,当网络分区时可能会导致双主。当异常的主(未被controller认可)有slave一同被网络隔离时,它的confirmOffset可能变化,导致消费位点前移。 这可能导致消息丢失。 ### Steps to Reproduce 3 controllers, 3 brokers. Using openchaos' random-partition may reproduce the bug.  ### What Did You Expect to See? no lost message. ### What Did You See Instead?  ### What Version Are You Using? v5.1.0 ### Environment Compiler: openjdk version "1.8.0_362" OS: CentOS ### Additional Context Maybe bug is in these code:  When the isolated master (A) is not linked to the current abnormal master (B) for a long time, the HAConnection (A) of the abnormal master (B) may be cleaned up. 当隔离后的master(A)长时间未链接至当前异常master(B),异常master(B)的HAConnection(A)可能被清理。  When slave(C) synchronizes messages with master(B), it will request confirmOffset. At this time, calculating confirmOffset will exclude A, so confirmOffset moves forward. 当slave(C) 向master(B)同步消息时会请求confirmOffset, 此时计算confirmOffset会将A排除在外,因而confirmOffset向前移动。  When the confirmOffset is moved, the reportMessageService will build this part of the message into the consumeQueue. 当confirmOffset移动后,reputMessageService会将该部分消息构建进入consumeQueue。  After the part of the message is consumed, the part of the message will be truncated when the network partition ends, and the new message will not be consumed. 当该部分消息被消费后,网络分区结束时该部分消息将被截断,新的消息将无法被消费。 Since the abnormal master is isolated from the controller, the syncStateSet is not updated successfully. We can check whether all the brokers in the syncStateSet have established connections. 由于异常master与controller隔离,因此syncStateSet未更新成功,可以检查syncStateSet中的broker是否都建立有connection。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
