gianm commented on issue #5709: Broker resiliency to misbehaving historical 
nodes
URL: 
https://github.com/apache/incubator-druid/issues/5709#issuecomment-414175186
 
 
   Hi @peferron,
   
   That scope sounds useful for an initial patch. I think the biggest risk is 
that queries that are doomed to failure, possibly because of resource limits 
being exceeded, will get retried too much and double/triple the load on the 
cluster (depending on how many retries are allowed). Some suggestions to 
mitigate that:
   
   - Check the error code (if there is one) and don't retry on codes like 
RESOURCE_LIMIT_EXCEEDED, UNAUTHORIZED, or QUERY_TIMEOUT. (The latter one 
because, probably, the overall timeout of the query has passed by then anyway.)
   - Don't retry more than X subqueries per query.
   
   Another thing to think about is that it is possible for results to be 
partially retrieved (and partially processed) and then for the query to fail 
midway through. In this case, it's probably not possible to recover, since 
subquery results have already been mixed into the overall query results. The 
query may need to be retried from scratch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to