I would like to get preliminary feedback about the provided patch.

Discussion on net-dev@ 
https://mail.openjdk.org/pipermail/net-dev/2023-March/020682.html

One of the main issue I try to solve is how the cache handle the intermittent 
DNS server outages due to overloading or network connection.

At the moment this cache can be configured by the application using the 
following two properties:
   (1) "networkaddress.cache.ttl"(30 sec) - cache policy for successful lookups
   (2) "networkaddress.cache.negative.ttl"(10 sec) - cache policy for negative 
lookups

The default timeout for positive responses is good enough to "have recent 
dns-records" and to "minimize the number of requests to the DNS server".

But the cache for the negative responses is problematic. This is a problem I 
would like to solve. Caching the negative response means that for **10** 
seconds the application will not be able to connect to the server.

Possible solutions:
  1. Decreasing timeout "for the negative responses": unfortunately more 
requests to the server at the moment of "DNS-outage" cause even more issues, 
since this is not the right moment to load the network/server more.
  2. Increasing timeout "for the positive responses": this will decrease the 
chance to get an error, but the cache will start to use stale data longer.
  3. This proposal: it would be good to ignore the negative response and 
continue to use the result of the last "successful lookup" until some 
additional timeout.

The idea is to split the notion of the TTL and the timeout used for the cache. 
When TTL for the record will expire we should request the new data from the 
server. If this request goes fine we will update the record, if it fails we 
will continue to use the cached date until the next sync.

For example, if the new property "networkaddress.cache.extended.ttl" is set to 
10 minutes, then we will cache a positive response for 10 minutes but will try 
to sync it every 30 seconds. If the new property is not set then as before we 
will cache positive for 30 seconds and then cache the negative response for 10 
seconds.


RFC 8767 Serving Stale Data to Improve DNS Resiliency:
https://www.rfc-editor.org/rfc/rfc8767

Comments about current and other possible implementations:
 * The code intentionally moved to the separate ValidAddresses class, just to 
make clear that the default configuration, when the new property is not set is 
not changed much.
 * The refresh timeout includes the time spent on the server lookup. So if we 
have to refresh every 2 seconds, but in lookup, we spend 3 seconds then we will 
request the server on each lookup. Another implementation may spend 3 seconds 
on lookup and then additional use the cached value for two seconds.
 * The extended timeout is a kind of "maximum stale timer" from the RFC above, 
but it starts counting not from the moment the record expired, but from the 
moment we added it to the cache. Another possible implementation may start 
counting from the moment the TTL expired, so the overall usage of the value 
will be sum ttl+extended.
 * The extended timeout has a hard deadline which is never changed during 
execution, for example, if it sets for 1 day, then at the end of the day we 
should make a successful lookup to recache the value otherwise we will start to 
use the "negative" cache. Other implementations may shift the expiration time 
on every successful sync.

Any thoughts about other possibilities?

-------------

Commit messages:
 -  the suggested cap is 7 days
 - simplify
 - comments
 - Merge remote-tracking branch 'upstream/master' into JDK-8304885
 - 8304885: Reuse stale data to improve DNS resolver resiliency

Changes: https://git.openjdk.org/jdk/pull/13285/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13285&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8304885
  Stats: 380 lines in 14 files changed: 316 ins; 45 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/13285.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13285/head:pull/13285

PR: https://git.openjdk.org/jdk/pull/13285

Reply via email to