Fryuni commented on issue #12292: URL: https://github.com/apache/druid/issues/12292#issuecomment-1308001765
> Currently we are facing the issue that our historicals are restarting every now and then (using kubernetes). > This can have multiple reasons, but one of them is unexpected exceptions in the curator. This portion seems related to #13167 > The scenario is that we have a volumes filled with segments already and druid starts to initialize/read them. > This is taking longer than 60 seconds, so kubelet is killing the pod before the stage "SERVER" can be reached and the health route is available. A `/status/alive` route that is started independent of the historical service would be somewhat redundant. The historical process already exits completely when there is a problem that would mark it as dead, and if something fails that does not exit the program but leaves the process in an unrecoverable state would likely not be detected by a route that is decoupled of the problem. One thing you can do is configure the existing `/status/health` as a readiness and startup probe, but no liveness probe. But I do like the idea of a route that only returns OK when there are no pending segments to be loaded or dropped. This would flip between error and ok when new segments are ingested, but that does fit the use cases that I have in mind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
