[
https://issues.apache.org/jira/browse/HADOOP-16543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922904#comment-16922904
]
shanyu zhao commented on HADOOP-16543:
--------------------------------------
Hi [[email protected]], thanks for your suggestions.
1) We've tried changing DNS TTL with no luck
2) The problem is due to Hadoop's RMProxy caches the InetSocketAddress, then
retry connecting to the IP address.
{code:java}
InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol); {code}
The fix is to create these additional FailoverProxyProvider:
For Non-HA senario:
- DefaultNoHaRMFailoverProxyProvider (without doing DNS resolution)
- AutoRefreshNoHaRMFailoverProxyProvider (do DNS resolution during retries)
For HA scenario:
- ConfiguredRMFailoverProxyProvider (without doing DNS resolution)
- AutoRefreshRMFailoverProxyProvider (do DNS resolution during retries in HA
scenario)
And add this configuration to cover non-ha mode config (in addition to
yarn.client.failover-proxy-provider):
yarn.client.failover-no-ha-proxy-provider
> Cached DNS name resolution error
> --------------------------------
>
> Key: HADOOP-16543
> URL: https://issues.apache.org/jira/browse/HADOOP-16543
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Roger Liu
> Priority: Major
>
> In Kubernetes, the a node may go down and then come back later with a
> different IP address. Yarn clients which are already running will be unable
> to rediscover the node after it comes back up due to caching the original IP
> address. This is problematic for cases such as Spark HA on Kubernetes, as the
> node containing the resource manager may go down and come back up, meaning
> existing node managers must then also be restarted.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]