wdtte opened a new issue, #184:
URL: https://github.com/apache/rocketmq-exporter/issues/184

   rocketmq-exporter 采集的 rocketmq_group_diff 指标出现大量数据断点,有的甚至几天时间才一个数据点,grafana 
截图:
   <img width="1582" height="671" alt="Image" 
src="https://github.com/user-attachments/assets/795d66d8-e4f4-48c7-9d09-738f87d47d16";
 />
   
   rocketmq-exporter 启动日志中疑似相关的错误:
   (错误信息大致指向broker通信失败,若真的有网络问题,业务方早就受影响了,但目前仅发现监控数据残缺;因此不知如何继续跟进)
   ```
   [2025-11-03 10:39:55.140] ERROR get topic's(paas_oplog_****) 
consumer-stats(oplog-****-***) exception
   org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send 
request to <172.17.41.89:10911> failed
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
        at 
org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumeStats(MQClientAPIImpl.java:1220)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineConsumeStats(DefaultMQAdminExtImpl.java:315)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineConsumeStats(DefaultMQAdminExt.java:258)
        at 
org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.examineConsumeStats(MQAdminExtImpl.java:232)
        at 
org.apache.rocketmq.exporter.task.MetricsCollectTask.collectConsumerOffset(MetricsCollectTask.java:336)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
        at 
org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at 
org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   ```
   
   除此之外还有很多其他错误:
   
   ```
   [2025-11-03 11:09:20.003]  WARN ClientMetricTask-exception.ignore. 
group=paas-****-*****-consumer,client 
[email protected]:9876;172.17.41.80:9876, client 
addr=172.17.45.10:55377, language=JAVA,version=477
   org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send 
request to <172.17.41.99:10911> failed
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
        at 
org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469)
        at 
org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407)
        at 
org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   [2025-11-03 11:09:20.006]  INFO closeChannel: close the connection to remote 
address[172.17.41.99:10911] result: true
   [2025-11-03 11:09:20.007]  WARN ClientMetricTask-exception.ignore. 
group=oplog-***-***,client 
[email protected]:9876;172.17.41.80:9876, client 
addr=172.17.5.14:58456, language=JAVA,version=477
   org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send 
request to <172.17.41.99:10911> failed
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
        at 
org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
        at 
org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842)
        at 
org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469)
        at 
org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407)
        at 
org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   [2025-11-03 11:09:25.003]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
   ```
   
   ```
   [2025-11-03 10:24:23.454]  INFO Completed initialization in 1 ms
   [2025-11-03 10:25:15.000]  INFO broker stats collection task starting....
   [2025-11-03 10:25:15.000]  INFO broker runtime stats collection task 
starting....
   [2025-11-03 10:25:15.000]  INFO consumer offset collection task starting....
   [2025-11-03 10:25:15.001]  INFO broker topic stats collection task 
starting....
   [2025-11-03 10:25:15.001]  INFO producer metric collection task starting....
   [2025-11-03 10:25:15.639]  INFO broker runtime stats collection task 
finished....639
   [2025-11-03 10:25:15.639]  INFO topic offset collection task starting....
   [2025-11-03 10:25:15.644]  INFO broker stats collection task finished....644
   [2025-11-03 10:25:16.554]  WARN collectTopicOffset-getting 
topic(%RETRY%oplog-object-change) stats error. the namesrv address is 
["172.17.41.80:9876"]
   [2025-11-03 10:25:20.079]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:20.079]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:20.081]  INFO closeChannel: close the connection to remote 
address[172.17.41.99:10911] result: true
   [2025-11-03 10:25:25.085]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:25.085]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:25.085]  INFO closeChannel: close the connection to remote 
address[172.17.41.101:10911] result: true
   [2025-11-03 10:25:30.086]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:30.086]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:30.089]  INFO closeChannel: close the connection to remote 
address[172.17.41.95:10911] result: true
   [2025-11-03 10:25:30.089]  INFO closeChannel: close the connection to remote 
address[172.17.41.95:10911] result: true
   [2025-11-03 10:25:35.088]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:35.088]  INFO closeChannel: close the connection to remote 
address[172.17.41.97:10911] result: true
   [2025-11-03 10:25:35.088]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:40.089]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:40.089]  INFO closeChannel: close the connection to remote 
address[172.17.41.91:10911] result: true
   [2025-11-03 10:25:40.089]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:45.089]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:45.090]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:45.096]  INFO closeChannel: close the connection to remote 
address[172.17.41.93:10911] result: true
   [2025-11-03 10:25:45.495]  INFO topic offset collection task 
finished....29856
   [2025-11-03 10:25:50.090]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:50.090]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:50.091]  INFO closeChannel: close the connection to remote 
address[172.17.41.87:10911] result: true
   [2025-11-03 10:25:55.092]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:55.092]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:25:55.094]  INFO closeChannel: close the connection to remote 
address[172.17.41.89:10911] result: true
   [2025-11-03 10:26:00.092]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:00.093]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:00.093]  INFO closeChannel: close the connection to remote 
address[172.17.41.83:10911] result: true
   [2025-11-03 10:26:05.096]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:05.096]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:05.285]  INFO closeChannel: close the connection to remote 
address[172.17.41.85:10911] result: true
   [2025-11-03 10:26:10.096]  WARN collectProducer. should not be here. 
cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:10.097]  WARN collectProducer. there are no producers in 
cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"]
   [2025-11-03 10:26:10.097]  INFO closeChannel: close the connection to remote 
address[172.17.41.81:10911] result: true
   [2025-11-03 10:26:15.000]  INFO topic offset collection task starting....
   [2025-11-03 10:26:15.000]  INFO broker runtime stats collection task 
starting....
   [2025-11-03 10:26:15.045]  INFO broker runtime stats collection task 
finished....44
   ```
   
   请问我们可以如何解决这个问题?或者可以向哪些方向排查?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to