Good weekend, everyone! Let me start by describing my problem first and then moving to proposed solution.
Problem:
Currently we have number of PoPs (Points-of-Presence) around the world with
Linux/nginx doing TCP/TLS/HTTP termination. There we re-encrypt traffic and
proxy_pass it to the upstream block with HUGE set of servers. Whole idea of
those PoP nginxes is to have pool of keepalive connections with enormous tcp
windows to upstreams.
But in reality we can not use any of nginx’es connection balancing methods
because they almost never reuse connections (yet again, our upstream list is
huge). Also each worker has it’s own keepalive pool which makes situation even
worse. Of cause we can generate per-server config files and give each server in
each PoP different(and small) set of upstream servers, but that solution sounds
awfully “clunky”.
Solution:
IPVS for example, among it's numerous job scheduling modes has Locality-Based
Least-Connection Scheduling[1], that looks quite close to what we want. The
only problem is that if all the worker processes on all our boxes around the
world will use same list of upstreams they will quickly overload first
upstream, then second, etc, therefore I’ve added randomized mode in which each
worker starts by filling upstreams w.r.t. some random starting point. That
should give good locality for tcp connection reuse and as law of large numbers
implies - good enough load distribution across upstreams globally.
Implementation:
PoC:
coloured: https://gist.github.com/SaveTheRbtz/d6a505555cd02cb6aee6
raw:
https://gist.githubusercontent.com/SaveTheRbtz/d6a505555cd02cb6aee6/raw/5aba3b0709777d2a6e99217bd3e06e2178846dc4/least_conn_locality_randomized.diff
It basically tries to find first(starting from per-worker-random for randomized
variant) not fully loaded peer and if it fails then it falls back to normal
least_conn.
Followup questions:
Does anyone in the community have similar use cases? CloudFlare maybe?
Is Nginx Inc interested in incorporating something patch like that, or is that
too specific to our workflow? Should I prettify that PoC or should I just throw
the ball your way?
Alternative solution:
Original upstream keepalive module[2] had “single” keyword, that also suites
our needs, though it was removed because, let me quote Maxim Dounin:
The original idea was to optimize edge cases in case of interchangeable
backends, i.e. don't establish a new connection if we have any one
cached. This causes more harm than good though, as it screws up
underlying balancer's idea about backends used and may result in
various unexpected problems.
[1]
http://kb.linuxvirtualserver.org/wiki/Locality-Based_Least-Connection_Scheduling
[2] http://mdounin.ru/hg/ngx_http_upstream_keepalive/
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ nginx-devel mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx-devel
