kroeders opened a new issue #10602:
URL: https://github.com/apache/druid/issues/10602


   Occasionally, historicals will experience faults ranging from hardware 
problems to corrupt segments and a variety of other issues, which causes 
queries to fail when they could be rerouted to other replicas. This has been 
discussed here [5709](https://github.com/apache/incubator-druid/issues/5709)
   
   As a first attempt at mitigating this problem, we would like to track error 
rates for historicals in the broker and remove historicals from server 
selection when other alternatives are available when error rate exceeds some 
threshold. 
   
   How can error rates be effectively tracked?
   
   - Maintaining total errors and total requests to each historical is 
straightforward to calculate, but it becomes difficult to track newly failing 
nodes as the total number of requests becomes large. 
   - Alternatively, the last N requests can be maintained in a list, where the 
oldest request can be cycled out when the N+1 request comes in and the error 
rate can be adjusted for these two requests. 
   - Similarly, requests can be maintained for the last N seconds and requests 
older than the window can be cycled out when new requests are made. 
   - Another option is to calculate a moving average based on the last N 
requests that have been made. 
   
   Once the error rate exceeds a certain threshold, the historical is marked as 
faulty and will only be selected by CachingClusteredClient when there are no 
other replicas available for a given segment. The list of faulty historicals 
can be made available via API and returned to circulation either after a 
certain amount of time has passed or when enabled via API. 
   
   Does this seem like a reasonable approach?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to