[ 
https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426418#comment-17426418
 ] 

ASF subversion and git services commented on KUDU-1885:
-------------------------------------------------------

Commit 1584e4325cfce37f89fd1dbebff5f24ed980780f in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=1584e43 ]

[master] KUDU-1885: re-resolve TSDescriptor proxies on network error

This patch uses the capabilities added in KUDU-75 to add the ability for
the master to re-resolve the addresses of tablet servers if its requests
fail.

Change-Id: I6245f75a232fd4827de684cfc04d6b6e53b7ddef
Reviewed-on: http://gerrit.cloudera.org:8080/17909
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Andrew Wong <[email protected]>


> Master caches DNS name resolution forever
> -----------------------------------------
>
>                 Key: KUDU-1885
>                 URL: https://issues.apache.org/jira/browse/KUDU-1885
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Adar Dembo
>            Priority: Major
>
> TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will 
> return the same proxy instances over and over. Normally, this is a reasonable 
> optimization. But suppose the IP address of the tserver changes (due to a 
> DHCP lease expiring or some such). Now these methods will be returning 
> unusable proxies, and there's no way to "reset" them.
> Admittedly this scenario is a little contrived: if a tserver's IP address 
> suddenly changes, a bunch of other stuff will break too. The tserver will 
> probably need to be restarted (since it's bound to a socket whose address no 
> longer exists), and consensus may be thoroughly wrecked due to built-in 
> host/port assumptions (see KUDU-418).
> An issue like this was reported by a user in Slack, who was running a master 
> and tserver on the same box. The symptom was "half-open" communication 
> between them: the tserver could heartbeat to the master, but the master could 
> not send RPCs to the tserver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to