[ 
https://issues.apache.org/jira/browse/HBASE-25292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25292:
----------------------------------------
    Description: 
We sometimes cache InetSocketAddress in data structures in an attempt to 
optimize away potential nameservice (DNS) lookups. This is, in general, an 
anti-pattern, because once an InetSocketAddress is resolved, resolution is 
never attempted again. The ideal pattern for connect() is ISA instantiation 
just before the connect() call, with no reuse of the ISA instance. For bind() 
we presume the local identity won't change while the process is live so usage 
and caching can be relaxed in that case.

If I can restate my proposal for a usage convention for InetSocketAddress, it 
would be this: Network identities should be bound late. This means addresses 
should be resolved at the last possible moment. Also, network identity mappings 
can change, so our code should not inappropriately cache them; otherwise we 
might miss a change and fail to operate normally.

I have reviewed the code for InetSocketAddress usage and in my opinion 
sometimes we are caching ISA acceptably, and in other cases we are not.

Correct cases:
 * We cache ISA for RPC connections, so we don't potentially do a lookup for 
every Call. However, we resolve the address earlier than we need to. The code 
can be improved by moving resolution to just before where we connect().
 * Use of ISA with bind. Typical uses like bindAddress or localAddress.
 * ClusterStatusPublisher in master.
 * Netty integration. Netty accepts and supplies ISA, no choice there, but we 
want to resolve at channel create time and cache for lifetime of the channel 
anyway. If a remote host goes away and is replaced with a new identity, a new 
channel/connection will be created with a new resolution just before connect() 
via higher layer error handling.

Incorrect cases that can be fixed:
 * RPC stubs. Remote clients may be recycled and replaced with new instances 
where the network identities (DNS name to IP address mapping) have changed--. 
HBASE-14544 attempts to work around DNS instability in data centers of years 
past in a way that, in my opinion, is the wrong thing to do in the modern era. 
In modern datacenters, in public cloud, and especially in kubernetes 
environments, DNS mappings are dynamic and subject to frequent change. It is 
just never the right thing to do to cache them. I intend to propose a revert of 
HBASE-14544. Reverting this simplifies some code a bit. That is the only 
reason: this is in my opinion some legacy that can be dropped, one fewer 
configuration variable (yay!), but if this part of the proposal is 
controversial it can be skipped.
 * RPC stubs again. When looking up the IP address of the remote host when 
creating a stub key we also make a key even if the resolution fails. This is 
the wrong thing to do. If we can't resolve the remote address, we can't contact 
the server. Making a stub that can't communicate is pointless. Throw an 
exception instead.
 * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
have to make up a list right away and cache them forever. We can use Address to 
record the list of favored nodes and convert from Address to InetSocketAddress 
on demand (when we go to create the HFile). This will allow us to resolve 
datanode hostnames just before they are needed. In public cloud, kubernetes, 
and or some private datacenter service deployment options, datanode servers may 
have their network identities (DNS name -> IP address mapping) changed over 
time. We can and should avoid inappropriate caching that may cause us to 
indefinitely use an incorrect address when contacting a favored node. 
 * Sometimes we use ISA when Address is just as good. For example, the dead 
servers list. If we are going to pay some attention to ISA usage discipline, 
let's remove the cases where we use ISA as a host and port pair but do not need 
to do so. Address works just as well and doesn't present an opportunity for 
misuse. Another example would be the RPC client concurrentCounterCache.

Incorrect cases that cannot be fixed:
 * hbase-external-blockcache: We have to resolve all of the memcached locations 
up front because the memcached client constructor requires ISA instances. So we 
have to hope that the network identities (DNS name -> IP address mapping) does 
not change for any in the list. This is beyond our control.

While in this area it is trivial to add new client connect metrics for number 
of potential nameservice lookups (whenever we instantiate an ISA) and number of 
failed nameservice lookups (if the instantiated ISA is unresolved).

While in this area I also noticed we often directly access a field in 
ConnectionId where there is also a getter, so good practice is to use the 
getter instead.

  was:
We sometimes cache InetSocketAddress in data structures in an attempt to 
optimize away potential nameservice (DNS) lookups. This is, in general, an 
anti-pattern, because once an InetSocketAddress is resolved, resolution is 
never attempted again. The ideal pattern for connect() is ISA instantiation 
just before the connect() call, with no reuse of the ISA instance. For bind() 
we presume the local identity won't change while the process is live so usage 
and caching can be relaxed in that case.

If I can restate my proposal for a usage convention for InetSocketAddress, it 
would be this: Network identities should be bound late. This means addresses 
should be resolved at the last possible moment. Also, network identity mappings 
can change, so our code should not inappropriately cache them; otherwise we 
might miss a change and fail to operate normally.

I have reviewed the code for InetSocketAddress usage and in my opinion 
sometimes we are caching ISA acceptably, and in other cases we are not.

Correct cases:
 * We cache ISA for RPC connections, so we don't potentially do a lookup for 
every Call. However, we resolve the address earlier than we need to. The code 
can be improved by moving resolution to just before where we connect().

Incorrect cases that can be fixed:
 * RPC stubs. Remote clients may be recycled and replaced with new instances 
where the network identities (DNS name to IP address mapping) have changed--. 
HBASE-14544 attempts to work around DNS instability in data centers of years 
past in a way that, in my opinion, is the wrong thing to do in the modern era. 
In modern datacenters, in public cloud, and especially in kubernetes 
environments, DNS mappings are dynamic and subject to frequent change. It is 
just never the right thing to do to cache them. I intend to propose a revert of 
HBASE-14544. Reverting this simplifies some code a bit. That is the only 
reason: this is in my opinion some legacy that can be dropped, one fewer 
configuration variable (yay!), but if this part of the proposal is 
controversial it can be skipped.
 * RPC stubs again. When looking up the IP address of the remote host when 
creating a stub key we also make a key even if the resolution fails. This is 
the wrong thing to do. If we can't resolve the remote address, we can't contact 
the server. Making a stub that can't communicate is pointless. Throw an 
exception instead.
 * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
have to make up a list right away and cache them forever. We can use Address to 
record the list of favored nodes and convert from Address to InetSocketAddress 
on demand (when we go to create the HFile). This will allow us to resolve 
datanode hostnames just before they are needed. In public cloud, kubernetes, 
and or some private datacenter service deployment options, datanode servers may 
have their network identities (DNS name -> IP address mapping) changed over 
time. We can and should avoid inappropriate caching that may cause us to 
indefinitely use an incorrect address when contacting a favored node. 
 * Sometimes we use ISA when Address is just as good. For example, the dead 
servers list. If we are going to pay some attention to ISA usage discipline, 
let's remove the cases where we use ISA as a host and port pair but do not need 
to do so. Address works just as well and doesn't present an opportunity for 
misuse. Another example would be the RPC client concurrentCounterCache.

Incorrect cases that cannot be fixed:
 * hbase-external-blockcache: We have to resolve all of the memcached locations 
up front because the memcached client constructor requires ISA instances. So we 
have to hope that the network identities (DNS name -> IP address mapping) does 
not change for any in the list. This is beyond our control.

While in this area it is trivial to add new client connect metrics for number 
of potential nameservice lookups (whenever we instantiate an ISA) and number of 
failed nameservice lookups (if the instantiated ISA is unresolved).

While in this area I also noticed we often directly access a field in 
ConnectionId where there is also a getter, so good practice is to use the 
getter instead.


> Improve InetSocketAddress usage discipline
> ------------------------------------------
>
>                 Key: HBASE-25292
>                 URL: https://issues.apache.org/jira/browse/HBASE-25292
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, HFile
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We sometimes cache InetSocketAddress in data structures in an attempt to 
> optimize away potential nameservice (DNS) lookups. This is, in general, an 
> anti-pattern, because once an InetSocketAddress is resolved, resolution is 
> never attempted again. The ideal pattern for connect() is ISA instantiation 
> just before the connect() call, with no reuse of the ISA instance. For bind() 
> we presume the local identity won't change while the process is live so usage 
> and caching can be relaxed in that case.
> If I can restate my proposal for a usage convention for InetSocketAddress, it 
> would be this: Network identities should be bound late. This means addresses 
> should be resolved at the last possible moment. Also, network identity 
> mappings can change, so our code should not inappropriately cache them; 
> otherwise we might miss a change and fail to operate normally.
> I have reviewed the code for InetSocketAddress usage and in my opinion 
> sometimes we are caching ISA acceptably, and in other cases we are not.
> Correct cases:
>  * We cache ISA for RPC connections, so we don't potentially do a lookup for 
> every Call. However, we resolve the address earlier than we need to. The code 
> can be improved by moving resolution to just before where we connect().
>  * Use of ISA with bind. Typical uses like bindAddress or localAddress.
>  * ClusterStatusPublisher in master.
>  * Netty integration. Netty accepts and supplies ISA, no choice there, but we 
> want to resolve at channel create time and cache for lifetime of the channel 
> anyway. If a remote host goes away and is replaced with a new identity, a new 
> channel/connection will be created with a new resolution just before 
> connect() via higher layer error handling.
> Incorrect cases that can be fixed:
>  * RPC stubs. Remote clients may be recycled and replaced with new instances 
> where the network identities (DNS name to IP address mapping) have changed--. 
> HBASE-14544 attempts to work around DNS instability in data centers of years 
> past in a way that, in my opinion, is the wrong thing to do in the modern 
> era. In modern datacenters, in public cloud, and especially in kubernetes 
> environments, DNS mappings are dynamic and subject to frequent change. It is 
> just never the right thing to do to cache them. I intend to propose a revert 
> of HBASE-14544. Reverting this simplifies some code a bit. That is the only 
> reason: this is in my opinion some legacy that can be dropped, one fewer 
> configuration variable (yay!), but if this part of the proposal is 
> controversial it can be skipped.
>  * RPC stubs again. When looking up the IP address of the remote host when 
> creating a stub key we also make a key even if the resolution fails. This is 
> the wrong thing to do. If we can't resolve the remote address, we can't 
> contact the server. Making a stub that can't communicate is pointless. Throw 
> an exception instead.
>  * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
> have to make up a list right away and cache them forever. We can use Address 
> to record the list of favored nodes and convert from Address to 
> InetSocketAddress on demand (when we go to create the HFile). This will allow 
> us to resolve datanode hostnames just before they are needed. In public 
> cloud, kubernetes, and or some private datacenter service deployment options, 
> datanode servers may have their network identities (DNS name -> IP address 
> mapping) changed over time. We can and should avoid inappropriate caching 
> that may cause us to indefinitely use an incorrect address when contacting a 
> favored node. 
>  * Sometimes we use ISA when Address is just as good. For example, the dead 
> servers list. If we are going to pay some attention to ISA usage discipline, 
> let's remove the cases where we use ISA as a host and port pair but do not 
> need to do so. Address works just as well and doesn't present an opportunity 
> for misuse. Another example would be the RPC client concurrentCounterCache.
> Incorrect cases that cannot be fixed:
>  * hbase-external-blockcache: We have to resolve all of the memcached 
> locations up front because the memcached client constructor requires ISA 
> instances. So we have to hope that the network identities (DNS name -> IP 
> address mapping) does not change for any in the list. This is beyond our 
> control.
> While in this area it is trivial to add new client connect metrics for number 
> of potential nameservice lookups (whenever we instantiate an ISA) and number 
> of failed nameservice lookups (if the instantiated ISA is unresolved).
> While in this area I also noticed we often directly access a field in 
> ConnectionId where there is also a getter, so good practice is to use the 
> getter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to