peferron commented on issue #5709: Broker resiliency to misbehaving historical 
nodes
URL: 
https://github.com/apache/incubator-druid/issues/5709#issuecomment-414190292
 
 
   Thanks for the feedback @gianm.
   
   Don't these issues already exist today when 
`druid.broker.retryPolicy.numTries > 1`? The idea would be to respect this 
parameter, but in a smarter way, by trying a different replica (if available) 
on each try.
   
   You're right that it could create more load in some situations, though. For 
example, a query that would have hit the same bad historical twice and failed 
immediately (in a scenario where failures are immediate) may now hit a good 
historical on the 2nd try and return successfully, at the expense of increased 
cluster load. If you set `numTries > 1` that's probably what you want, but we 
could put this new behavior behind an opt-in configuration parameter to be safe.
   
   I'm not sure how the partially processed results that you described are 
handled today when `druid.broker.retryPolicy.numTries > 1`. I need to spend a 
bit more time with the code.
   
   BTW, looking at `RetryQueryRunner`, it seems like we're trying `numTries + 
1` times total (one initial try + `numTries` retries), contrary to what the 
docs and the name of the parameter indicate. If there's indeed a bug there then 
everyone may already have retries enabled by default. I need to run a few unit 
tests to make sure I'm not missing anything there though.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to