Zanglei06 opened a new issue #2732: URL: https://github.com/apache/rocketmq/issues/2732
1. Please describe the issue you observed: - What did you do (The steps to reproduce)? - What did you expect to see? - What did you see instead? In our production environment, I find some msgs lost when a new consumer started(causing rebalance), the RMQ version we use is 4.7.1 and we use the new LitePullConconsumer API. From the rocketmq-client log, something unexpected happened: 1. the same messageQueue is detected should cancel its pullTask in two different threads in almost the same time. 2021-03-09 20:16:19.911 WARN [PullMsgThread-c_g1] (Slf4jLoggerFactory.java:115) - The Pull Task is cancelled after doPullTask, MessageQueue [topic=t, brokerName=rmq-b5, queueId=3] 2021-03-09 20:16:19.911 WARN [PullMsgThread-c_g2] (Slf4jLoggerFactory.java:115) - The Pull Task is cancelled after doPullTask, MessageQueue [topic=t, brokerName=rmq-b5, queueId=3] 2. before rebalance, there are two consumer(two consumer cid), when a new consumer is started, it becomes three(3 consumer cid), but the rebalance triggered several times with wrong cid count, which means findConsumerIdList API returns wrong value. I think the wrong rebalance should not cause any msg lost since rebalance is done in a single thread and finally should be correct. But why wrong cid is returned is interesting( calling a wrong broker? slave?). below is the logs(I changed some inner ip and brokerName info for security reasons) 2021-03-09 20:16:19.777 INFO [RebalanceService] (Slf4jLoggerFactory.java:100) - rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=c_g, topic=t, clientId=XXX_C1, mqAllSize=24, cidAllSize=3, rebalanceResultSize=8, rebalanceResultSet=XXX (3 cid, corrent) 2021-03-09 20:16:19.779 INFO [RebalanceService] (Slf4jLoggerFactory.java:100) - rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=c_g, topic=t, clientId=XXX_C1, mqAllSize=24, cidAllSize=2, rebalanceResultSize=12, rebalanceResultSet=XXX (2 cid, wrong) 2021-03-09 20:16:19.781 INFO [RebalanceService] (Slf4jLoggerFactory.java:100) - rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=c_g, topic=t, clientId=XXX_C1, mqAllSize=24, cidAllSize=3, rebalanceResultSize=8, rebalanceResultSet=XXX (3 cid, correct) 2021-03-09 20:16:19.784 INFO [RebalanceService] (Slf4jLoggerFactory.java:100) - rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=c_g, topic=t, clientId=XXX_C1, mqAllSize=24, cidAllSize=2, rebalanceResultSize=12, rebalanceResultSet=XXX (2 cid , wrong) 2021-03-09 20:16:19.785 INFO [RebalanceService] (Slf4jLoggerFactory.java:100) - rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=c_g, topic=t, clientId=XXX_C1, mqAllSize=24, cidAllSize=3, rebalanceResultSize=8, rebalanceResultSet=XXX (3 cid, correct) 3. from rocketmq-client log the rebalance notification from broker found slave broker ips additional info: in one java process we have one consumer and one producer with different clientId; the consumer is polling messages for one group and one topic(only one subscription); the producer is sending messages to many topics( different from consumer topic); 2. Please tell us about your environment: RMQ 4.7.1 LitePullConsumer ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
