jadami10 opened a new pull request, #13196:
URL: https://github.com/apache/pinot/pull/13196

   This is a `feature` to allow
   1. an API call to reset adaptive routing stats for 1 instance or all 
instances
   2. add a config to automatically reset server stats for a new instance
   
   This is a response to an issue we had a few months ago where adaptive 
routing was sending an outsized number of queries to 2 replicas out of 3 we had 
configured. Restarting the brokers fixed the issue.
   
   Adding the API feels non-controversial, and we've added this to our incident 
playbooks as a potential remediation when we see skewed routing.
   
   While we've tested the configuration to automatically reset server stats, we 
have not enabled it by default. It's not immediately clear if this is more 
valuable than just relying on autodecay, but as far as I can tell, if a server 
is removed and re-added, there's really no reason to keep the previous stats 
around as they will be invalid.
   
   Unfortunately, it has been impossible to replicate the original incident, so 
we've mostly observed metrics in steady state or under synthetic load.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to