[
https://issues.apache.org/jira/browse/HBASE-25292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-25292:
----------------------------------------
Description:
We sometimes cache InetSocketAddress in data structures in an attempt to
optimize away potential nameservice (DNS) lookups. This is, in general, an
anti-pattern, because once an InetSocketAddress is resolved, resolution is
never attempted again. The ideal pattern for connect() is ISA instantiation
just before the connect() call, with no reuse of the ISA instance. For bind()
we presume the local identity won't change while the process is live so usage
and caching can be relaxed in that case.
If I can restate my proposal for a usage convention for InetSocketAddress, it
would be this: Network identities should be bound late. This means addresses
should be resolved at the last possible moment. Also, network identity mappings
can change, so our code should not inappropriately cache them; otherwise we
might miss a change and fail to operate normally.
I have reviewed the code for InetSocketAddress usage and in my opinion
sometimes we are caching ISA acceptably, and in other cases we are not.
Correct cases:
* We cache ISA for RPC connections, so we don't potentially do a lookup for
every Call. However, we resolve the address earlier than we need to. The code
can be improved by moving resolution to just before where we connect().
Incorrect cases that can be fixed:
* RPC stubs. Remote clients may be recycled and replaced with new instances
where the network identities (DNS name to IP address mapping) have changed--.
HBASE-14544 attempts to work around DNS instability in data centers of years
past in a way that, in my opinion, is the wrong thing to do in the modern era.
In modern datacenters, in public cloud, and especially in kubernetes
environments, DNS mappings are dynamic and subject to frequent change. It is
just never the right thing to do to cache them. I intend to propose a revert of
HBASE-14544. Reverting this simplifies some code a bit. That is the only
reason: this is in my opinion some legacy that can be dropped, one fewer
configuration variable (yay!), but if this part of the proposal is
controversial it can be skipped.
* RPC stubs again. When looking up the IP address of the remote host when
creating a stub key we also make a key even if the resolution fails. This is
the wrong thing to do. If we can't resolve the remote address, we can't contact
the server. Making a stub that can't communicate is pointless. Throw an
exception instead.
* Favored nodes. Although the HDFS API requires InetSocketAddress, we don't
have to make up a list right away and cache them forever. We can use Address to
record the list of favored nodes and convert from Address to InetSocketAddress
on demand (when we go to create the HFile). This will allow us to resolve
datanode hostnames just before they are needed. In public cloud, kubernetes,
and or some private datacenter service deployment options, datanode servers may
have their network identities (DNS name -> IP address mapping) changed over
time. We can and should avoid inappropriate caching that may cause us to
indefinitely use an incorrect address when contacting a favored node.
* Sometimes we use ISA when Address is just as good. For example, the dead
servers list. If we are going to pay some attention to ISA usage discipline,
let's remove the cases where we use ISA as a host and port pair but do not need
to do so. Address works just as well and doesn't present an opportunity for
misuse. Another example would be the RPC client concurrentCounterCache.
Incorrect cases that cannot be fixed:
* hbase-external-blockcache: We have to resolve all of the memcached locations
up front because the memcached client constructor requires ISA instances. So we
have to hope that the network identities (DNS name -> IP address mapping) does
not change for any in the list. This is beyond our control.
While in this area it is trivial to add new client connect metrics for number
of potential nameservice lookups (whenever we instantiate an ISA) and number of
failed nameservice lookups (if the instantiated ISA is unresolved).
While in this area I also noticed we often directly access a field in
ConnectionId where there is also a getter, so good practice is to use the
getter instead.
was:
We sometimes cache InetSocketAddress in data structures in an attempt to
optimize away potential nameservice (DNS) lookups. This is, in general, an
anti-pattern, because once an InetSocketAddress is resolved, resolution is
never attempted again. The ideal pattern for connect() is ISA instantiation
just before the connect() call, with no reuse of the ISA instance. For bind()
we presume the local identity won't change while the process is live so usage
and caching can be relaxed in that case.
If I can restate my proposal for a usage convention for InetSocketAddress, it
would be this: Network identities should be bound late. This means addresses
should be resolved at the last possible moment. Also, network identity mappings
can change, so our code should not inappropriately cache them; otherwise we
might miss a change and fail to operate normally.
I have reviewed the code for InetSocketAddress usage and in my opinion
sometimes we are caching ISA acceptably, and in other cases we are not.
Correct cases:
* We cache ISA for RPC connections, so we don't potentially do a lookup for
every Call. However, we resolve the address earlier than we need to. The code
can be improved by moving resolution to just before where we connect().
Incorrect cases that can be fixed:
* RPC stubs. Remote clients may be recycled and replaced with new instances
where the network identities (DNS name to IP address mapping) have changed--.
HBASE-14544 attempts to work around DNS instability in data centers of years
past in a way that, in my opinion, is the wrong thing to do in the modern era.
In modern datacenters, in public cloud, and especially in kubernetes
environments, DNS mappings are dynamic and subject to frequent change. I intend
to propose a revert of HBASE-14544. Reverting this simplifies some code a bit.
That is the only reason: this is in my opinion some legacy that can be dropped,
one fewer configuration variable (yay!), but if this part of the proposal is
controversial it can be skipped.
* RPC stubs again. When looking up the IP address of the remote host when
creating a stub key we also make a key even if the resolution fails. This is
the wrong thing to do. If we can't resolve the remote address, we can't contact
the server. Making a stub that can't communicate is pointless. Throw an
exception instead.
* Favored nodes. Although the HDFS API requires InetSocketAddress, we don't
have to make up a list right away and cache them forever. We can use Address to
record the list of favored nodes and convert from Address to InetSocketAddress
on demand (when we go to create the HFile). This will allow us to resolve
datanode hostnames just before they are needed. In public cloud, kubernetes,
and or some private datacenter service deployment options, datanode servers may
have their network identities (DNS name -> IP address mapping) changed over
time. We can and should avoid inappropriate caching that may cause us to
indefinitely use an incorrect address when contacting a favored node.
* Sometimes we use ISA when Address is just as good. For example, the dead
servers list. If we are going to pay some attention to ISA usage discipline,
let's remove the cases where we use ISA as a host and port pair but do not need
to do so. Address works just as well and doesn't present an opportunity for
misuse. Another example would be the RPC client concurrentCounterCache.
Incorrect cases that cannot be fixed:
* hbase-external-blockcache: We have to resolve all of the memcached locations
up front because the memcached client constructor requires ISA instances. So we
have to hope that the network identities (DNS name -> IP address mapping) does
not change for any in the list. This is beyond our control.
While in this area it is trivial to add new client connect metrics for number
of potential nameservice lookups (whenever we instantiate an ISA) and number of
failed nameservice lookups (if the instantiated ISA is unresolved).
While in this area I also noticed we often directly access a field in
ConnectionId where there is also a getter, so good practice is to use the
getter instead.
> Improve InetSocketAddress usage discipline
> ------------------------------------------
>
> Key: HBASE-25292
> URL: https://issues.apache.org/jira/browse/HBASE-25292
> Project: HBase
> Issue Type: Bug
> Components: Client, HFile
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We sometimes cache InetSocketAddress in data structures in an attempt to
> optimize away potential nameservice (DNS) lookups. This is, in general, an
> anti-pattern, because once an InetSocketAddress is resolved, resolution is
> never attempted again. The ideal pattern for connect() is ISA instantiation
> just before the connect() call, with no reuse of the ISA instance. For bind()
> we presume the local identity won't change while the process is live so usage
> and caching can be relaxed in that case.
> If I can restate my proposal for a usage convention for InetSocketAddress, it
> would be this: Network identities should be bound late. This means addresses
> should be resolved at the last possible moment. Also, network identity
> mappings can change, so our code should not inappropriately cache them;
> otherwise we might miss a change and fail to operate normally.
> I have reviewed the code for InetSocketAddress usage and in my opinion
> sometimes we are caching ISA acceptably, and in other cases we are not.
> Correct cases:
> * We cache ISA for RPC connections, so we don't potentially do a lookup for
> every Call. However, we resolve the address earlier than we need to. The code
> can be improved by moving resolution to just before where we connect().
> Incorrect cases that can be fixed:
> * RPC stubs. Remote clients may be recycled and replaced with new instances
> where the network identities (DNS name to IP address mapping) have changed--.
> HBASE-14544 attempts to work around DNS instability in data centers of years
> past in a way that, in my opinion, is the wrong thing to do in the modern
> era. In modern datacenters, in public cloud, and especially in kubernetes
> environments, DNS mappings are dynamic and subject to frequent change. It is
> just never the right thing to do to cache them. I intend to propose a revert
> of HBASE-14544. Reverting this simplifies some code a bit. That is the only
> reason: this is in my opinion some legacy that can be dropped, one fewer
> configuration variable (yay!), but if this part of the proposal is
> controversial it can be skipped.
> * RPC stubs again. When looking up the IP address of the remote host when
> creating a stub key we also make a key even if the resolution fails. This is
> the wrong thing to do. If we can't resolve the remote address, we can't
> contact the server. Making a stub that can't communicate is pointless. Throw
> an exception instead.
> * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't
> have to make up a list right away and cache them forever. We can use Address
> to record the list of favored nodes and convert from Address to
> InetSocketAddress on demand (when we go to create the HFile). This will allow
> us to resolve datanode hostnames just before they are needed. In public
> cloud, kubernetes, and or some private datacenter service deployment options,
> datanode servers may have their network identities (DNS name -> IP address
> mapping) changed over time. We can and should avoid inappropriate caching
> that may cause us to indefinitely use an incorrect address when contacting a
> favored node.
> * Sometimes we use ISA when Address is just as good. For example, the dead
> servers list. If we are going to pay some attention to ISA usage discipline,
> let's remove the cases where we use ISA as a host and port pair but do not
> need to do so. Address works just as well and doesn't present an opportunity
> for misuse. Another example would be the RPC client concurrentCounterCache.
> Incorrect cases that cannot be fixed:
> * hbase-external-blockcache: We have to resolve all of the memcached
> locations up front because the memcached client constructor requires ISA
> instances. So we have to hope that the network identities (DNS name -> IP
> address mapping) does not change for any in the list. This is beyond our
> control.
> While in this area it is trivial to add new client connect metrics for number
> of potential nameservice lookups (whenever we instantiate an ISA) and number
> of failed nameservice lookups (if the instantiated ISA is unresolved).
> While in this area I also noticed we often directly access a field in
> ConnectionId where there is also a getter, so good practice is to use the
> getter instead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)