ankitsultana opened a new issue, #11626: URL: https://github.com/apache/pinot/issues/11626
We use the following MBean to track Helix Pending messages in Pinot Servers: ``` CLMParticipantReport:MonitorType=ParticipantMessageMonitor,ParticipantName=some_instance_id: ``` We have often seen that this MBean reports that there are Helix pending messages, but there's no thread processing any Helix messages. There are also no messages in `INSTANCES/<instance-id>/MESSAGES` ZNode. In this particular instance, I see the following things happened around the time we saw this issue: * There was a big GC pause * Zk client lost connection * There's a Helix log message: `Tasks that never commenced execution after 200`. This is followed by a list of 238 FutureTask, and the MBean is set to exactly 238 Pending Messages * There were failures in deleting Helix messages in `INSTANCES/<instance-id>/MESSAGES`. There were 9 such messages. Some other signals can be seen in the attached Grafana screenshot. <img width="1722" alt="image" src="https://github.com/apache/pinot/assets/8644710/9cc301bd-3d3f-4983-a810-88f5ff0abb62"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
