On 26/07/2016 06:56 μμ, Willy Tarreau wrote:
> On Tue, Jul 26, 2016 at 05:51:08PM +0200, Pavlos Parissis wrote:
>> In all my setups I have nbproc > 1 and after a lot of changes and on how I
>> aggregate HAProxy
>> stats and what most people want to see on graphs, I came up that with
>> something like the following:
>>
>> {
>> "frontend": {
>> "www.haproxy.org": {
>> "bin": "999999999999",
>> "lbtot": "555555",
>> ...
>> },
>> "www.haproxy.com": {
>> "bin": "999999999999",
>> "lbtot": "555555",
>> ...
>> },
>> },
>> "backend": {
>> "www.haproxy.org": {
>> "bin": "999999999999",
>> "lbtot": "555555",
>> ....
>> "server": {
>> "srv1": {
>> "bin": "999999999999",
>> "lbtot": "555555",
>> ....
>> },
>> ...
>> },
>> },
>> },
>> "haproxy": {
>> "PipesFree": "555",
>> ...
>> ,
>> "per_process": {
>> "id1": {
>> "PipesFree": "555",
>> "Process_num": "1",
>> ...
>> },
>> "id2": {
>> "PipesFree": "555",
>> "Process_num": "2",
>> ...
>> },
>> ...
>> },
>> },
>> "server": {
>> "srv1": {
>> "bin": "999999999999",
>> "lbtot": "555555",
>> ...
>> },
>> ...
>> },
>> }
>>
>>
>> Let me explain a bit:
>>
>> - It is very useful and handy to know stats for a server per backend but
>> also across all
>> backends. Thus, I include a top level key 'server' which holds stats for
>> each server across all
>> backends. Few server's stats has to be excluded as they are meaningless in
>> this context.
>> For example, status, lastchg, check_duration, check_code and few others. For
>> those which aren't
>> counters but fixed numbers you want to either sum them(slim) or get the
>> average(weight). I
>> don't do the latter in my setup.
>
> You probably have not looked at the output of "show stats typed", it
> gives you the nature of each value letting you know how to aggregate
> them (min, max, avg, sum, pick any, etc).
>
I have seen it but it isn't available on 1.6. It could simplify my code, I
should give a try.
>> - Aggregation across multiple processes for haproxy stats(show info output)
>
> It's not only "show info", this one reports only the process health.
>
>> As you can see I provide stats per process and across all processes.
>> It has been proven very useful to know the CPU utilization per process. We
>> depend on the kernel
>> to do the distribution of incoming connects to all processes and so far it
>> works very well, but
>> sometimes you see a single process to consume a lot of CPU and if you don't
>> provide percentiles
>> or stats per process then you are going to miss it. The metrics about
>> uptime, version,
>> description and few other can be excluded in the aggregation.
>
> These last ones are in the "pick any" type of aggregation I was talking about.
>
>> - nbproc > 1 and aggregation for frontend/backend/server
>> My proposal doesn't cover stats for frontend/backend/server per haproxy
>> process.
>
> But that's precisely the limitation I'm reporting :-)
>
>> The stats are already aggregated and few metrics are excluded. For example
>> all status stuff.
>> Each process performs healthchecking, so they act as little brains which
>> never agree on the
>> status of a server as they run their checks on different interval.
>
> Absolutely, but at least you want to see their stats. For example how many
> times a server has switched state per process then in total (meaning a
> proportional amount of possibly visible issues).
>
True, but in setups with ECMP in front of N HAProxy nodes which run in nbproc
mode you offload
application healthchecking to a dedicated daemon which runs on servers(service
discovery+service availability with consul/zookeeper stuff) and you only run
TCP checks
from HAProxy.
In our setup we don't real care about how many times a server flapped, it
doesn't tell us
something we don't know already, application is in broken state.
But, other people may find it useful.
> My issue is that if the *format* doesn't support per-process stats, we'll have
> to emit a new format 3 months later for all the people who want to process it.
> We've reworked the stats dump to put an end to the problem where depending on
> the output format you used to have different types of information, and there
> was no single representation carrying them all at once. For me now it's
> essential that if we prepare a new format it's not stripped down from the
> info people need, otherwise it will automatically engender yet another format.
>
Agree. I am fine giving per process stats for servers/frontends/backends.
Adding another top level key 'per_process' in my proposal should be a good
start:
{
"per_process": {
"proc1": {
"frontend": {
"www.haproxy.org": {
"bin": "999999999999",
"lbtot": "555555",
...
},
"www.haproxy.com": {
"bin": "999999999999",
"lbtot": "555555",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "999999999999",
"lbtot": "555555",
....
"server": {
"srv1": {
"bin": "999999999999",
"lbtot": "555555",
....
},
...
},
},
},
"haproxy": {
"PipesFree": "555",
...
},
"server": {
"srv1": {
"bin": "999999999999",
"lbtot": "555555",
...
},
...
},
},
...
},
"frontend": {
"www.haproxy.org": {
"bin": "999999999999",
"lbtot": "555555",
...
},
"www.haproxy.com": {
"bin": "999999999999",
"lbtot": "555555",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "999999999999",
"lbtot": "555555",
....
"server": {
"srv1": {
"bin": "999999999999",
"lbtot": "555555",
....
},
...
},
},
},
"haproxy": {
"PipesFree": "555",
...
},
},
"server": {
"srv1": {
"bin": "999999999999",
"lbtot": "555555",
...
},
...
},
}
What do you think?
Cheers,
Pavlos
signature.asc
Description: OpenPGP digital signature

