[
https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426418#comment-17426418
]
ASF subversion and git services commented on KUDU-1885:
-------------------------------------------------------
Commit 1584e4325cfce37f89fd1dbebff5f24ed980780f in kudu's branch
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=1584e43 ]
[master] KUDU-1885: re-resolve TSDescriptor proxies on network error
This patch uses the capabilities added in KUDU-75 to add the ability for
the master to re-resolve the addresses of tablet servers if its requests
fail.
Change-Id: I6245f75a232fd4827de684cfc04d6b6e53b7ddef
Reviewed-on: http://gerrit.cloudera.org:8080/17909
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Andrew Wong <[email protected]>
> Master caches DNS name resolution forever
> -----------------------------------------
>
> Key: KUDU-1885
> URL: https://issues.apache.org/jira/browse/KUDU-1885
> Project: Kudu
> Issue Type: Bug
> Components: master
> Affects Versions: 1.3.0
> Reporter: Adar Dembo
> Priority: Major
>
> TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will
> return the same proxy instances over and over. Normally, this is a reasonable
> optimization. But suppose the IP address of the tserver changes (due to a
> DHCP lease expiring or some such). Now these methods will be returning
> unusable proxies, and there's no way to "reset" them.
> Admittedly this scenario is a little contrived: if a tserver's IP address
> suddenly changes, a bunch of other stuff will break too. The tserver will
> probably need to be restarted (since it's bound to a socket whose address no
> longer exists), and consensus may be thoroughly wrecked due to built-in
> host/port assumptions (see KUDU-418).
> An issue like this was reported by a user in Slack, who was running a master
> and tserver on the same box. The symptom was "half-open" communication
> between them: the tserver could heartbeat to the master, but the master could
> not send RPCs to the tserver.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)