gianm edited a comment on issue #5709: Broker resiliency to misbehaving 
historical nodes
URL: 
https://github.com/apache/incubator-druid/issues/5709#issuecomment-414175186
 
 
   Hi @peferron,
   
   That scope sounds useful for an initial patch. I think the biggest risk is 
that queries that are doomed to failure, possibly because of resource limits 
being exceeded, will get retried too much and double/triple the load on the 
cluster (depending on how many retries are allowed). Some suggestions to 
mitigate that:
   
   - Check the error code (if there is one) and don't retry on codes like 
RESOURCE_LIMIT_EXCEEDED, UNAUTHORIZED, or QUERY_TIMEOUT. (The latter one 
because, probably, the overall timeout of the query has passed by then anyway.)
   - Don't retry more than X subqueries per query.
   
   Another thing to think about is that it is possible for results to be 
partially retrieved (and partially processed) and then for the subquery to fail 
midway through (before all results have come in). In this case, it's probably 
not possible for the broker to recover, since subquery results have already 
been mixed into the overall query results. The query may need to be retried 
from scratch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to