a2l007 opened a new issue #6518: [Proposal] Shutdown druid processes upon complete loss of ZK connectivity URL: https://github.com/apache/incubator-druid/issues/6518 Currently if there is a loss of connectivity between the druid nodes and the zookeeper, the curator attempts connection retries and finally gives up retrying. At this point, the druid node is in a weird state. In case of this happening to a broker, it would still serve queries but provide possibly incorrect results. Historicals with loss of ZK connectivity would fail to show up on the coordinator console, even the process is still running (which could be tricky for cluster operators to identify). The proposal that I'm working on is to shutdown the druid process once the connection retries to ZK are exhausted. Shutting down the process would make more sense than the node remaining in an unstable state as the former can trigger configured process alerts or if there is a supervisor process configured, it can restart the druid process.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
