> I would like to get preliminary feedback about the provided patch. > > Discussion on net-dev@ > https://mail.openjdk.org/pipermail/net-dev/2023-March/020682.html > > One of the main issue I try to solve is how the cache handle the intermittent > DNS server outages due to overloading or network connection. > > At the moment this cache can be configured by the application using the > following two properties: > (1) "networkaddress.cache.ttl"(30 sec) - cache policy for successful > lookups > (2) "networkaddress.cache.negative.ttl"(10 sec) - cache policy for > negative lookups > > The default timeout for positive responses is good enough to "have recent > dns-records" and to "minimize the number of requests to the DNS server". > > But the cache for the negative responses is problematic. This is a problem I > would like to solve. Caching the negative response means that for **10** > seconds the application will not be able to connect to the server. > > Possible solutions: > 1. Decreasing timeout "for the negative responses": unfortunately more > requests to the server at the moment of "DNS-outage" cause even more issues, > since this is not the right moment to load the network/server more. > 2. Increasing timeout "for the positive responses": this will decrease the > chance to get an error, but the cache will start to use stale data longer. > 3. This proposal: it would be good to ignore the negative response and > continue to use the result of the last "successful lookup" until some > additional timeout. > > The idea is to split the notion of the TTL and the timeout used for the > cache. When TTL for the record will expire we should request the new data > from the server. If this request goes fine we will update the record, if it > fails we will continue to use the cached date until the next sync. > > For example, if the new property "networkaddress.cache.extended.ttl" is set > to 10 minutes, then we will cache a positive response for 10 minutes but will > try to sync it every 30 seconds. If the new property is not set then as > before we will cache positive for 30 seconds and then cache the negative > response for 10 seconds. > > > RFC 8767 Serving Stale Data to Improve DNS Resiliency: > https://www.rfc-editor.org/rfc/rfc8767 > > Comments about current and other possible implementations: > * The code intentionally moved to the separate ValidAddresses class, just to > make clear that the default configuration, when the new property is not set > is not changed much. > * The refresh timeout includes the time spent on the server lookup. So if we > have to refresh every 2 seconds, but in lookup, we spend 3 seconds then we > will request the server on each lookup. Another implementation may spend 3 > seconds on lookup and then additional use the cached value for two seconds. > * The extended timeout is a kind of "maximum stale timer" from the RFC > above, but it starts counting not from the moment the record expired, but > from the moment we added it to the cache. Another possible implementation may > start counting from the moment the TTL expired, so the overall usage of the > value will be sum ttl+extended. > * The extended timeout has a hard deadline which is never changed during > execution, for example, if it sets for 1 day, then at the end of the day we > should make a successful lookup to recache the value otherwise we will start > to use the "negative" cache. Other implementations may shift the expiration > time on every successful sync. > > Any thoughts about other possibilities?
Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8304885 - Use "maximum stale timer" instead of the extended timeout, and bump it on each successful lookup - the suggested cap is 7 days - simplify - comments - Merge remote-tracking branch 'upstream/master' into JDK-8304885 - 8304885: Reuse stale data to improve DNS resolver resiliency ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13285/files - new: https://git.openjdk.org/jdk/pull/13285/files/c9b0d79b..4db0216c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13285&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13285&range=01-02 Stats: 10481 lines in 234 files changed: 2916 ins; 7145 del; 420 mod Patch: https://git.openjdk.org/jdk/pull/13285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13285/head:pull/13285 PR: https://git.openjdk.org/jdk/pull/13285