[ Long message and proposal follows. Bear with me. There are a lot of words,
but that is because we need a lot of help/input! ;-) ]
So, this has come up in the past several times, and we discussed it again this
year at ApacheCon: How do we get the load balancer to make smarter, more
informed decisions about where to send traffic?
The different LB methods provide some different attempts at balancing traffic,
but ultimately none of them is "smart" about its decision. Other than a member
being in error state, the balancer makes its decision solely based on
configuration (LB set, factor, etc.) and its own knowledge of the member (e.g.
requests, bytes). What we have often discussed is a way to get some type of
health/load/capacity information from the backend to make informed balancing
decisions.
One method is to use health checks (a la haproxy, AWS ELBs, etc.) that request
one or more URLs and the response code/time indicates whether or not the
service is up and available, allowing more proactive decisions. While this is
better than our current state of reactively marking members in error state
based on failed requests, it still provides a limited view of the health/state
of the backend.
We have also discussed implementing a way for backends to communicate a magical
"load" number to the front end to take into account as it balances traffic.
This would give a much better view into the backend's state, but requires some
way to come up with this calculation that each backend
system/server/service/app must provide. This then has to be implemented in all
the various backends (e.g. httpd, tomcat, php-fpm, unicorn, mongrel, etc.,
etc.), probably a hard sell to all of those projects. And, the front end would
have limited control over what that number means or how to use it.
During JimJag's balancer talk at ApacheCon this year, he brought up this issue
of "better, more informed" decision making again. I put some thought into it
that night and came up with some ideas. Jim, Covener, Trawick, Ruggeri, and I
then spent some time over the next couple of days talking it through and
fleshing out some of the details.
Based on all of that, below is what I am proposing. I have some initial code
that I am working on to implement the different pieces of this, and I will put
them up in bugz or somewhere when they're a little less rudimentary.
--
Our hope is to create a general standard that can be used by various projects,
products, proxies, servers, etc., to have a more consistent way for a load
balancer to request and receive useful internal state information from its
backend nodes. This information can then be used by the *frontend*
software/admin (this is the main change from what we have discussed before) to
calculate a load factor appropriate for each backend node.
This communication uses a new, standard HTTP header, "X-Backend-Info", that
takes this form in RFC2616 BNF:
backend-info = "version" "=" numeric-entry
[
*LWS "," *LWS
#( numeric-entry | string-entry )
]
numeric-entry = numeric-field "=" ( float | <"> float <"> )
; that is, numbers may optionally be enclosed in
; quotation marks
float = 1*DIGIT [ "." 1*DIGIT ]
numeric-field = "workers-max"
; maximum number of workers the backend supports
| "workers-used"
; current number of used/busy workers
| "workers-allocated"
; current number of allocated/ready workers
| "workers-free"
; current number of workers available for use
; (generally the difference between workers-max and
; workers-used, though some implementations may have
; a different notion)
| "uptime"
; number of seconds the backend has been running
| "requests"
; number of requests the backend has processed
| "memory-max"
; total amount of memory available in bytes
| "memory-used"
; amount of used memory in bytes
| "memory-allocated"
; amount of allocated/committed memory in bytes
| "memory-free"
; amount of memory available for use (generally
; the difference between memory-max and memory-used,
; though some implementations may have a different
; notion)
| "load-current"
; the (subjective) current load for the backend
| "load-5"
; the (subjective) 5-minute load for the backend
| "load-15"
; the (subjective) 15-minute load for the backend
string-entry = string-field "=" ( token | quoted-string )
string-field = "provider"
; informational description of backend information
; provider (module, container, subsystem, app, etc.)
As used here, "worker" is an overloaded term whose precise meaning is
backend-dependent. It might refer to processes, threads, pipelines, or whatever
the backend system/server/service/app uses to measure or limit its number of
active, processing connections.
The process-flow looks like this:
1. The frontend (periodically based on time or requests, or on demand) as part
of either (1) a normal proxied request or (2) a dedicated health check adds an
"X-Backend-Info" request header to a backend request, informing the backend
that it wants node state information. I.e.:
X-Backend-Info: version=1.0
2. The backend node receives a request with an "X-Backend-Info" header
specifying a version it supports.
3. A supporting backend node SHOULD insert one or more "X-Backend-Info"
response headers with any subset of the backend-info fields that it supports,
including the required "version" field. The version of information provided
MUST be less than or equal to the version requested. (The fields are
standardized so that various frontends know what to expect, rather than each
backend system/server/service/app creating its own fields/values.) E.g.:
X-Backend-Info: version=1.0, provider="Backend X", workers-max=1000,
workers-used=517, workers-free=483, uptime=19234,
requests=85939
4. The backend MUST add the "X-Backend-Info" token to the "Connection" response
header, making it a hop-by-hop field that is removed by the frontend from the
downstream response (RFC2616 14.10 and RFC7230 6.1). [Note there appears to be
an httpd bug here that I intend to submit and that needs to be addressed.]
Connection: X-Backend-Info
5. The frontend parses the backend-info entries in the received
"X-Backend-Info" response header. The values are then used as part of either an
internal or an administrator-specified calculation to determine the load factor
or weight of that node for subsequent requests.
6. The frontend MUST remove the "X-Backend-Info" hop-to-hop response header per
RFCs.
--
As for httpd implementation, this has two pieces. The first is when httpd is
used as a backend node behind a load balancer and must provide X-Backend-Info
response data. For this, I have created a module tentatively named
mod_proxy_backend_info that does nothing except insert an output filter to
populate the response header with version, provider, workers-*, request,
uptime, and load-* values when the request header is present. Here is an
example request-response:
% curl -v -H 'X-Backend-Info: version=1.0' http://localhost/
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.41.0
> Host: localhost
> Accept: */*
> X-Backend-Info: version=1.0
>
< HTTP/1.1 200 OK
< Date: Thu, 30 Apr 2015 04:32:08 GMT
< Server: Apache/2.4.9 (Unix) PHP/5.5.14
< Last-Modified: Wed, 15 Apr 2015 14:04:54 GMT
< ETag: "2d-513c3d4d78d80"
< Accept-Ranges: bytes
< Content-Length: 45
< X-Backend-Info: version=1.0, provider="mod_proxy_backend_info [Apache/2.4.9
(Unix) PHP/5.5.14]", workers-max=256, workers-busy=1, workers-ready=4,
workers-free=255, uptime=1448, requests=3, load-current=1.737305,
load-5=1.733887, load-15=1.668457
< Connection: X-Backend-Info
< Content-Type: text/html
<
<html><body><h1>It works!</h1></body></html>
The second piece is when httpd is used as the load balancer. For this, I have
created a module tentatively named mod_lbmethod_bybackendinfo that will:
1. Periodically (based on elapsed time, number of requests, or both since last
update) insert the X-Backend-Info request header into a proxied request.
2. Parse and remove the X-Backend-Info response header.
3. Calculate the member's "informed" load factor based on a formula specified
by the user/admin in the configuration. I hope to just use the existing
lbfactor field to store this calculated value. Then we can use existing logic
to balance based on lbset and lbfactor for subsequent requests.
4. Store the current time and request count in the member's data structure so
the lbmethod knows when it needs to be updated again.
What I need from all of you:
- Input/commentary on the proposed idea, approach, and implementation. Renaming
things, additional fields that might be useful, etc., are all up for discussion.
- Help with handling the configuration formula mentioned in #3 above. Can we
just add some math operators to the expression parser to handle this? What all
operations/functions might we need (+-*/? max? min? ternary if-then-else? ...)?
A simple-ish example (something like this maybe?):
<Proxy "balancer://...">
BalancerMember ...
...
ProxySet \
lbmethod=bybackendinfo \
backendupdateseconds=30 \
backendupdaterequests=100 \
backendformula="%{BACKEND:uptime} -lt 120 ? 1 : %{BACKEND:workers-free} /
%{BACKEND:workers-max} * 100"
</Proxy>
- [Near-long-term] Help adding X-Backend-Info backend support and documentation
to various projects. Tomcat, php-fpm, others(?) should be fairly easy to
implement and submit patches. This work does us no good if none of our backends
support it.
- [Long-term] Help adding X-Backend-Info frontend support and documentation to
various projects to help this become an "accepted ad-hoc standard"...or
something like that. Nginx, haproxy, and many others would be targets.
Warn out from writing all of this and hopeful that someone other than me
actually cares, I wish you all well today/tonight!
- Jim