kroeders opened a new issue #10602: URL: https://github.com/apache/druid/issues/10602
Occasionally, historicals will experience faults ranging from hardware problems to corrupt segments and a variety of other issues, which causes queries to fail when they could be rerouted to other replicas. This has been discussed here [5709](https://github.com/apache/incubator-druid/issues/5709) As a first attempt at mitigating this problem, we would like to track error rates for historicals in the broker and remove historicals from server selection when other alternatives are available when error rate exceeds some threshold. How can error rates be effectively tracked? - Maintaining total errors and total requests to each historical is straightforward to calculate, but it becomes difficult to track newly failing nodes as the total number of requests becomes large. - Alternatively, the last N requests can be maintained in a list, where the oldest request can be cycled out when the N+1 request comes in and the error rate can be adjusted for these two requests. - Similarly, requests can be maintained for the last N seconds and requests older than the window can be cycled out when new requests are made. - Another option is to calculate a moving average based on the last N requests that have been made. Once the error rate exceeds a certain threshold, the historical is marked as faulty and will only be selected by CachingClusteredClient when there are no other replicas available for a given segment. The list of faulty historicals can be made available via API and returned to circulation either after a certain amount of time has passed or when enabled via API. Does this seem like a reasonable approach? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
