wdtte opened a new issue, #184: URL: https://github.com/apache/rocketmq-exporter/issues/184
rocketmq-exporter 采集的 rocketmq_group_diff 指标出现大量数据断点,有的甚至几天时间才一个数据点,grafana 截图: <img width="1582" height="671" alt="Image" src="https://github.com/user-attachments/assets/795d66d8-e4f4-48c7-9d09-738f87d47d16" /> rocketmq-exporter 启动日志中疑似相关的错误: (错误信息大致指向broker通信失败,若真的有网络问题,业务方早就受影响了,但目前仅发现监控数据残缺;因此不知如何继续跟进) ``` [2025-11-03 10:39:55.140] ERROR get topic's(paas_oplog_****) consumer-stats(oplog-****-***) exception org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.89:10911> failed at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441) at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390) at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumeStats(MQClientAPIImpl.java:1220) at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineConsumeStats(DefaultMQAdminExtImpl.java:315) at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineConsumeStats(DefaultMQAdminExt.java:258) at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.examineConsumeStats(MQAdminExtImpl.java:232) at org.apache.rocketmq.exporter.task.MetricsCollectTask.collectConsumerOffset(MetricsCollectTask.java:336) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ``` 除此之外还有很多其他错误: ``` [2025-11-03 11:09:20.003] WARN ClientMetricTask-exception.ignore. group=paas-****-*****-consumer,client [email protected]:9876;172.17.41.80:9876, client addr=172.17.45.10:55377, language=JAVA,version=477 org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.99:10911> failed at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441) at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390) at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917) at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842) at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469) at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407) at org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) [2025-11-03 11:09:20.006] INFO closeChannel: close the connection to remote address[172.17.41.99:10911] result: true [2025-11-03 11:09:20.007] WARN ClientMetricTask-exception.ignore. group=oplog-***-***,client [email protected]:9876;172.17.41.80:9876, client addr=172.17.5.14:58456, language=JAVA,version=477 org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.99:10911> failed at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441) at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390) at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917) at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842) at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469) at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407) at org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) [2025-11-03 11:09:25.003] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"] ``` ``` [2025-11-03 10:24:23.454] INFO Completed initialization in 1 ms [2025-11-03 10:25:15.000] INFO broker stats collection task starting.... [2025-11-03 10:25:15.000] INFO broker runtime stats collection task starting.... [2025-11-03 10:25:15.000] INFO consumer offset collection task starting.... [2025-11-03 10:25:15.001] INFO broker topic stats collection task starting.... [2025-11-03 10:25:15.001] INFO producer metric collection task starting.... [2025-11-03 10:25:15.639] INFO broker runtime stats collection task finished....639 [2025-11-03 10:25:15.639] INFO topic offset collection task starting.... [2025-11-03 10:25:15.644] INFO broker stats collection task finished....644 [2025-11-03 10:25:16.554] WARN collectTopicOffset-getting topic(%RETRY%oplog-object-change) stats error. the namesrv address is ["172.17.41.80:9876"] [2025-11-03 10:25:20.079] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:20.079] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:20.081] INFO closeChannel: close the connection to remote address[172.17.41.99:10911] result: true [2025-11-03 10:25:25.085] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:25.085] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:25.085] INFO closeChannel: close the connection to remote address[172.17.41.101:10911] result: true [2025-11-03 10:25:30.086] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:30.086] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:30.089] INFO closeChannel: close the connection to remote address[172.17.41.95:10911] result: true [2025-11-03 10:25:30.089] INFO closeChannel: close the connection to remote address[172.17.41.95:10911] result: true [2025-11-03 10:25:35.088] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:35.088] INFO closeChannel: close the connection to remote address[172.17.41.97:10911] result: true [2025-11-03 10:25:35.088] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:40.089] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:40.089] INFO closeChannel: close the connection to remote address[172.17.41.91:10911] result: true [2025-11-03 10:25:40.089] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:45.089] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:45.090] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:45.096] INFO closeChannel: close the connection to remote address[172.17.41.93:10911] result: true [2025-11-03 10:25:45.495] INFO topic offset collection task finished....29856 [2025-11-03 10:25:50.090] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:50.090] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:50.091] INFO closeChannel: close the connection to remote address[172.17.41.87:10911] result: true [2025-11-03 10:25:55.092] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:55.092] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"] [2025-11-03 10:25:55.094] INFO closeChannel: close the connection to remote address[172.17.41.89:10911] result: true [2025-11-03 10:26:00.092] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:00.093] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:00.093] INFO closeChannel: close the connection to remote address[172.17.41.83:10911] result: true [2025-11-03 10:26:05.096] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:05.096] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:05.285] INFO closeChannel: close the connection to remote address[172.17.41.85:10911] result: true [2025-11-03 10:26:10.096] WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:10.097] WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"] [2025-11-03 10:26:10.097] INFO closeChannel: close the connection to remote address[172.17.41.81:10911] result: true [2025-11-03 10:26:15.000] INFO topic offset collection task starting.... [2025-11-03 10:26:15.000] INFO broker runtime stats collection task starting.... [2025-11-03 10:26:15.045] INFO broker runtime stats collection task finished....44 ``` 请问我们可以如何解决这个问题?或者可以向哪些方向排查? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
