Jackie-Jiang opened a new pull request #3885: Actively check cluster changes if there is no callback for a long time URL: https://github.com/apache/incubator-pinot/pull/3885 We encountered a site issue recently, and is suspecting that Helix callback for cluster changes is not working, probably because ZK re-connection. This PR enabled the proactive cluster change check if there is no callback for 1 hour. Changes include: 1. Rewrite ClusterChangeMediator to proactively perform cluster change check 2. Disable the Helix batch-mode and perform deduplication in ClusterChangeMediator 3. Disable the Helix pre-fetch to reduce the ZK accesses 4. Add interface ClusterChangeHandler for general cluster change handle 5. Add ClusterChangeHandler implementation: ExternalViewChangeHandler, InstanceConfigChangeHandler, LiveInstanceChangeHandler 6. Add metrics to track CLUSTER_CHANGE_QUEUE_TIME and PROACTIVE_CLUSTER_CHANGE_CHECK count
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
