Since there are checks for submitting an invalid supervisor spec (e.g., if 
nothing in "bootstrap.servers" is resolvable), the error condition this is 
addressing seems to be situations where some external state changes and the 
supervisor spec no longer works. For the unresolvable bootstrap.servers issue 
as an example, maybe its a transient DNS error, so I feel like it's better to 
retry indefinitely without increasing backoff times (with retries tied to the 
configured supervisor run period instead). Otherwise, suppose the backoff time 
became extremely large but the user had fixed the underlying issue in their 
environment, with a huge backoff time they would have to restart the overlord 
for the change to take effect.

RetryUtils.retry() is also a blocking call, which I want to avoid here in the 
lifecycle startup.

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/6383 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to