On 2/5/2023 7:01 PM, J Carter wrote:
Hi Aleksei,

Why not permanently assign the task of resolving a given upstream server group (all servers/peers within it) to a single worker?

It seems that this approach would resolve the SRV issues, and remove the need for the shared queue of tasks.

The load would still be spread evenly for the most realistic scenarios - which is where there are many upstream server groups of few servers, as opposed to few upstream server groups of many servers.

The intent of the change was exactly opposite, to avoid any permanent assignment of periodic tasks to a worker and allow another processes to resume resolving if the original assignee exits, no matter if normally or abnormally. I'm not even doing enough for that -- I should've kept in-progress tasks at the end of the queue with expires = resolver timeout + a small constant, and retry from another process when the timeout is reached, but the idea was abandoned for a minuscule improvement of insertion time. I expect to be asked to reconsider, as patch 6/6 does not cover all the possible situations where we want to recover a stale task.

A permanent assignment of a whole upstream would also require notifying another processes that the upstream is no longer assigned if the worker exits or consistently recovering that assignment over a restart of single worker (e.g. after a crash - not a regular situation, but one we should take into account nonetheless). And the benefit is not quite obvious - I mentioned that resolving SRVs with a lot of records may take longer to update the list of peers, but the situation with contention is not expected to change significantly* if we pin these tasks to a single worker as another worker may be doing the same for another upstream. Most importantly, this isn't even a bottleneck. It only slightly exacerbates an existing problem with certain balancers that already suffer from the overuse of locks, in a configuration that was specifically crafted to amplify and highlight the difference and is far from these most realistic scenarios.

* Pending verification on a performance test stand.
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to