The series is a compilation of patches with the upstream re-resolve feature from the Nginx Plus. The original commits were rebased on top of the current OSS code, grouped by features introduced and squashed. Some formatting quirks and other minor oddities could be attributed to a conscious effort to reduce divergence with the source branch.
The last couple of patches in the series is a new code that allows sharing name resolution tasks between all the workers. Known issues and TODOs: === - The whole series is known to be broken on win32 with multiple worker processes, as it relies on the ngx_worker value to keep track of the locality of data. Initializing ngx_worker to a correct value should address that. 'noreuse' zones also seem to be unsupported on this platform, so configuration reload may fail. - The functionality requires shared zone of a sufficient size to be configured in the upstream block. A rough estimation is 2k for a configured server entry + 2k for each resolved address. The zone requirement could be lifted with local allocation of the resolved peer data, but implementing that was out of scope. - Resolved peer addresses are not carried over to a new generation of workers during configuration reload (see below). - Tests still require some cleanup and will be published later. Peer list population delay === In the cases of a cold start, a reload or a binary upgrade, the upstreams that contain only resolvable servers will have an empty list of peers. This leads to a short delay before Nginx is able to send the traffic to upstream. There's no perfect solution for that: if the server list in the configuration has changed, it's no longer compatible with the data we collected for a previous config. If the resolver parameters were modified, we may get an entirely different set of servers. The following options were considered: - Publishing the preresolve code from the Nginx Plus as is. The solution involves copying peer states from the non-reusable zone of a previous generation of workers. This only addresses the reload case and may result in a stale peer data if the configuration changes. The advantage of this code is that it is heavily tested and has been running in multiple production environments for many years. - Sharing the zone between all generations of workers. This requires some changes in the code, notably improving reference counting and cleanup for peer data in the shared zone (as we're no longer able to discard the old zone with all the allocated data) and tracking the upstream configuration compatibility. It also doesn't work when the zone size has changed in the config. The approach leads to increased memory requirements: zone size should be configured to accomodate multiple generations of workers, and we are aware of deployments that have lots of those due to long-living connections. Nginx OSS does not offer any means to monitor shared memory usage at the moment, so I fear this approach will hurt a lot of unsuspecting users who haven't reserved enough memory. There are also performance concerns, as access to the same list of peers from multiple generations of workers would increase lock contention (and the situation is already not looking well with round-robin lb). We can copy the peers instead of attempting to reuse, but that prevents us from optimizing the memory usage. - Queueing the requests until we finish the initial cycle of name resolution ('queue' directive of the ngx_http_upstream_module). This option adds a latency spike at the moment of configuration reload. There's also an issue with propagation of the upstream readiness state to all the worker processes - we need an event passing channel to be able to resume queued requests immediately. On the positive side, this would mitigate downtime for all 3 scenarios, as long as the queue capacity is sufficient. Given the latency spike, it doesn't seem to be a good standalone solution. But it might be a nice addition to one of the options above. Alternatives like pre-resolving servers during configuration load were not considered due to complexity and significant disadvantages. Maxim, from the list archives I understand that you had a negative opinion on the current approach with noreuse zones and pre-resolve, but I'm afraid there wasn't enough context to understand all the sides of that discussion. I'd appreciate if you share your thoughts on the problem and on the approach you consider architecturally correct. _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel