a2l007 opened a new issue #6518: [Proposal] Shutdown druid processes upon 
complete loss of ZK connectivity
URL: https://github.com/apache/incubator-druid/issues/6518
 
 
   Currently if there is a loss of connectivity between the druid nodes and the 
zookeeper, the curator attempts connection retries and finally gives up 
retrying. At this point, the druid node is in a weird state. In case of this 
happening to a broker, it would still serve queries but provide possibly 
incorrect results. Historicals with loss of ZK connectivity would fail to show 
up on the coordinator console, even the process is still running (which could 
be tricky for cluster operators to identify).
   The proposal that I'm working on is to shutdown the druid process once the 
connection retries to ZK are exhausted. Shutting down the process would make 
more sense than the node remaining in an unstable state as the former can 
trigger configured process alerts or if there is a supervisor process 
configured, it can restart the druid process.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to