suiyuzeng opened a new issue #1244: Return FLUSH_SLAVE_TIMEOUT while message is sent successfully URL: https://github.com/apache/rocketmq/issues/1244 Config: broker: master broker role:SYNC_MASTER sendMessageThreadPoolNums=4 topic: Topic_A: qps:3w, WaitStoreMsgOK:false Topic_B: qps:100, WaitStoreMsgOK:true Producer of Topic_B get code FLUSH_SLAVE_TIMEOUT when send messages to the two topic at the same time. The time cost do not exceed the timeout. There is no problem about the synchronization between master and slave. The cause is the wakeup of the GroupCommitRequest in org.apache.rocketmq.store.ha.HAService.GroupTransferService#doWaitTransfer. for (int i = 0; !transferOK && i < 5; i++) { this.notifyTransferObject.waitForRunning(1000); transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset(); } As the qps of Topic_A is much higher than Topic_B and the config sendMessageThreadPoolNums is set to 4, the Topic_A will trigger waitForRunning for 5 times before the synchronization of Topic_B is completed. In org.apache.rocketmq.store.CommitLog#handleHA, the GroupCommitRequest is added to ha service and wait unit timeout. In doWaitTransfer, how about check if the request is expire? Add expire timestamp to GroupCommitRequest and check if expire, such as follow: while (!transferOK && defaultMessageStore.getSystemClock().now() < req.getExpireTimestamp()) { this.notifyTransferObject.waitForRunning(1000); transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset(); }
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
