Fryuni commented on issue #12292:
URL: https://github.com/apache/druid/issues/12292#issuecomment-1308001765

   > Currently we are facing the issue that our historicals are restarting 
every now and then (using kubernetes).
   > This can have multiple reasons, but one of them is unexpected exceptions 
in the curator.
   
   This portion seems related to #13167 
   
   > The scenario is that we have a volumes filled with segments already and 
druid starts to initialize/read them.
   > This is taking longer than 60 seconds, so kubelet is killing the pod 
before the stage "SERVER" can be reached and the health route is available.
   
   A `/status/alive` route that is started independent of the historical 
service would be somewhat redundant. The historical process already exits 
completely when there is a problem that would mark it as dead, and if something 
fails that does not exit the program but leaves the process in an unrecoverable 
state would likely not be detected by a route that is decoupled of the problem.
   
   One thing you can do is configure the existing `/status/health` as a 
readiness and startup probe, but no liveness probe.
   
   But I do like the idea of a route that only returns OK when there are no 
pending segments to be loaded or dropped. This would flip between error and ok 
when new segments are ingested, but that does fit the use cases that I have in 
mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to