[
https://issues.apache.org/jira/browse/CURATOR-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963625#comment-15963625
]
Andy Sloane commented on CURATOR-229:
-------------------------------------
Right, there are two cases (permanent and temporary), but in both cases I would
argue the behavior is undesirable.
If the host is truly not resolvable, then you get the above background thread
exception logged, and... nothing else is obviously wrong.
{{CuratorFrameworkImpl.start()}} returns without issue while the background
thread hangs, and there's no API-level indication, unless you've registered an
UnhandledErrorListenable, that anything is wrong, at least if you're using
simple things like {{LeaderLatch}}.
If it's a temporary DNS failure, and retrying would work, then retrying in the
background and not complaining in {{start()}} is fine, but if it's permanent
you're stuck without really bubbling the configuration error to the surface.
Even just not treating the error within {{CuratorFrameworkImpl.start()}} as a
background exception but instead just throwing it to the caller would improve
the situation. And if it was previously connected, and is reconnecting outside
of {{start}} then attempting to reconnect to zk makes sense.
> No retry on DNS lookup failure
> ------------------------------
>
> Key: CURATOR-229
> URL: https://issues.apache.org/jira/browse/CURATOR-229
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 2.7.0
> Reporter: Michael Putters
>
> Our environment is setup so that host names (rather than IP addresses) are
> used when registering services.
> When disconnecting a node from the network, it will attempt to reconnect and
> - in order to do this - attempts to resolve a host name, which fails (since
> we have no network connectivity and a DNS server is used).
> It appears this type of exception is not retryable, and the node simply gives
> up and never reconnects, even when the network connectivity is back.
> Is this the expected behavior? Is there any way to configure Curator so that
> this type of exception is retryable? I had a look at
> {{CuratorFrameworkImpl.java}} around line 768 but there doesn't seem to be
> anything configurable.
> If this is not the expected behavior (or if it is but you don't mind making
> it configurable), I should be able to provide a patch via a pull request.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)