jadami10 opened a new pull request, #13196: URL: https://github.com/apache/pinot/pull/13196
This is a `feature` to allow 1. an API call to reset adaptive routing stats for 1 instance or all instances 2. add a config to automatically reset server stats for a new instance This is a response to an issue we had a few months ago where adaptive routing was sending an outsized number of queries to 2 replicas out of 3 we had configured. Restarting the brokers fixed the issue. Adding the API feels non-controversial, and we've added this to our incident playbooks as a potential remediation when we see skewed routing. While we've tested the configuration to automatically reset server stats, we have not enabled it by default. It's not immediately clear if this is more valuable than just relying on autodecay, but as far as I can tell, if a server is removed and re-added, there's really no reason to keep the previous stats around as they will be invalid. Unfortunately, it has been impossible to replicate the original incident, so we've mostly observed metrics in steady state or under synthetic load. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
