[ 
https://issues.apache.org/jira/browse/HBASE-25292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25292:
----------------------------------------
    Status: Patch Available  (was: Open)

> Improve InetSocketAddress usage discipline
> ------------------------------------------
>
>                 Key: HBASE-25292
>                 URL: https://issues.apache.org/jira/browse/HBASE-25292
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, HFile
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0
>
>
> We sometimes cache InetSocketAddress in data structures in an attempt to 
> optimize away potential nameservice (DNS) lookups. This is, in general, an 
> anti-pattern, because once an InetSocketAddress is resolved, resolution is 
> never attempted again. The ideal pattern for connect() is ISA instantiation 
> just before the connect() call, with no reuse of the ISA instance. For bind() 
> we presume the local identity won't change while the process is live so usage 
> and caching can be relaxed in that case.
> If I can restate my proposal for a usage convention for InetSocketAddress, it 
> would be this: Network identities should be bound late. This means addresses 
> should be resolved at the last possible moment. Also, network identity 
> mappings can change, so our code should not inappropriately cache them; 
> otherwise we might miss a change and fail to operate normally.
> I have reviewed the code for InetSocketAddress usage and in my opinion 
> sometimes we are caching ISA acceptably, and in other cases we are not.
> Correct cases:
>  * We cache ISA for RPC connections, so we don't potentially do a lookup for 
> every Call. However, we resolve the address earlier than we need to. The code 
> can be improved by moving resolution to just before where we connect().
>  * Use of ISA with bind. Typical uses like bindAddress, listenerAddress, 
> initialIsa, or localAddress.
>  ** (There is no harm to keep direct use of ISA for bind() but these could 
> all be replaced with Address and on-demand create of ISA just before bind().
>  * ClusterStatusPublisher in master.
>  * Netty integration. Netty accepts and supplies ISA, no choice there, but we 
> want to resolve at channel create time and cache for lifetime of the channel 
> anyway. If a remote host goes away and is replaced with a new identity, a new 
> channel/connection will be created with a new resolution just before 
> connect() via higher layer error handling.
> Incorrect cases that can be fixed:
>  * RPC stubs. Remote clients may be recycled and replaced with new instances 
> where the network identities (DNS name to IP address mapping) have changed--. 
> HBASE-14544 attempts to work around DNS instability in data centers of years 
> past in a way that, in my opinion, is the wrong thing to do in the modern 
> era. In modern datacenters, in public cloud, and especially in kubernetes 
> environments, DNS mappings are dynamic and subject to frequent change. It is 
> just never the right thing to do to cache them. I intend to propose a revert 
> of HBASE-14544. Reverting this simplifies some code a bit. That is the only 
> reason: this is in my opinion some legacy that can be dropped, one fewer 
> configuration variable (yay!), but if this part of the proposal is 
> controversial it can be skipped.
>  * RPC stubs again. When looking up the IP address of the remote host when 
> creating a stub key we also make a key even if the resolution fails. This is 
> the wrong thing to do. If we can't resolve the remote address, we can't 
> contact the server. Making a stub that can't communicate is pointless. Throw 
> an exception instead.
>  * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
> have to make up a list right away and cache them forever. We can use Address 
> to record the list of favored nodes and convert from Address to 
> InetSocketAddress on demand (when we go to create the HFile). This will allow 
> us to resolve datanode hostnames just before they are needed. In public 
> cloud, kubernetes, and or some private datacenter service deployment options, 
> datanode servers may have their network identities (DNS name -> IP address 
> mapping) changed over time. We can and should avoid inappropriate caching 
> that may cause us to indefinitely use an incorrect address when contacting a 
> favored node. 
>  * Sometimes we use ISA when Address is just as good. For example, the dead 
> servers list. If we are going to pay some attention to ISA usage discipline, 
> let's remove the cases where we use ISA as a host and port pair but do not 
> need to do so. Address works just as well and doesn't present an opportunity 
> for misuse. Another example would be the RPC client concurrentCounterCache.
>  ** We could do a lot more substitutions than what is proposed. All of the 
> ISA uses with bind() are okay as is but could also be updated.
> Incorrect cases that cannot be fixed:
>  * hbase-external-blockcache: We have to resolve all of the memcached 
> locations up front because the memcached client constructor requires ISA 
> instances. So we have to hope that the network identities (DNS name -> IP 
> address mapping) does not change for any in the list. This is beyond our 
> control.
> While in this area it is trivial to add new client connect metrics for number 
> of potential nameservice lookups (whenever we instantiate an ISA) and number 
> of failed nameservice lookups (if the instantiated ISA is unresolved).
> While in this area I also noticed we often directly access a field in 
> ConnectionId where there is also a getter, so good practice is to use the 
> getter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to