santosh-d3vpl3x opened a new issue, #18746:
URL: https://github.com/apache/druid/issues/18746

   ### Motivation
     - Reduce dependence on ZooKeeper by offering Consul as a first-class 
discovery and leader-election option.
     - Many operators already run Consul; supporting it simplifies their Druid 
deployments and aligns with existing service catalogs and security (ACL/TLS).
     - Completes the HTTP-based discovery path (works with http server 
view/task runner) for clusters outside Kubernetes.
   
   ### Proposed changes
     - Add contrib extension `druid-consul-extensions` that wires Consul into 
Druid discovery (announcer + service watcher) and leader election for 
Coordinator/Overlord.
     - Provide configuration for agent host/port, service prefix, datacenter 
pinning, ACL token, Basic Auth, TLS/mTLS, watch retry/backoff, health-check 
interval/TTL, deregister
     timeout, and extra service tags.
     - Package the extension and document setup, config examples, and 
operational notes.
     - Emit metrics for Consul registration, leader election, and watch 
retries; log failures clearly.
     - Add dev harness docs (e.g., docker run consul) and integration 
tests/smoke checks that exercise registration, discovery, and leader failover.
   
   ### Rationale
     - Consul is widely deployed and already handles service catalog + KV + 
sessions suitable for leader locks.
     - Alternative considered: keep using ZooKeeper or K8s API-based discovery; 
those don’t fit environments standardized on Consul or where operators want to 
retire ZooKeeper.
     - Reuses Druid’s existing HTTP-based server view/task runner, avoiding 
additional protocols.
   
   ### Operational impact
     - Prereqs: `druid.serverview.type=http` and 
`druid.indexer.runner.type=httpRemote` must be set cluster-wide before 
switching discovery/leader selectors to Consul.
     - Migration path from ZooKeeper:
       1) Enable HTTP server view/task runner while still on ZooKeeper 
discovery.
       2) Load the Consul extension and set Consul configs on all nodes (common 
DC).
       3) Cut over discovery + leader selectors to Consul per role/tier; 
monitor metrics/logs.
       4) Remove ZooKeeper dependency after stability validated; rollback by 
pointing selectors back to ZooKeeper.
     - Rolling/zero downtime: true zero downtime is not realistic because 
discovery catalogs and leader election can’t be mixed. Roll out the 
bits/configs under ZooKeeper, then do a
     fast, coordinated switch to Consul (expect a brief disruption for 
restarts/leader flip).
   
   ### Test plan (optional)
     - Stand up a local Consul agent and verify node registration, health TTL 
updates, service discovery watch, and Coordinator/Overlord leader election 
failover.
     - Add unit tests for config parsing and session/watch retry logic; 
integration tests for watch loop and leader lock behavior.
     - Manual smoke doc: docker-run Consul, run minimal Druid nodes with Consul 
discovery/election, exercise restart/failover.
     - CI setup that can be optionally triggered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to