pdeva opened a new issue #6172: nodes should allow draining
URL: https://github.com/apache/incubator-druid/issues/6172
 
 
   currently individual nodes have no concept of draining.
   this means when updating the cluster, as you take down nodes, the queries in 
progress will fail.
   similarly, if you have broker nodes behind a load balancer, there is no way 
to tell the load balancer to stop sending new connection to the node you are 
about to update, which can result in many seconds of requests being sent to 
nodes down depending on health check interval.
   
   i suggest adding a couple endpoints to broker nodes:
   
   1. `/health/ping` returns 200 when the broker is ready to serve queries
   2. `/health/startDrain` sets a flag that makes `/health/ping` throw 500. 
this will make load balancer health checks fail while not dropping existing 
connections, resulting in zero downtime updates.
   
   Similar endpoints can be put on MM and Historical nodes, with coordinator 
performing the health check. when the health check returns non-200 values, 
coordinator can instruct broker not to send any new queries.
   
   This will result in 100% zero downtime rolling updates.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to